Chapter 3. Functions#

Why Functions?#

The most important concept in Python and arguably in all of programming is that of a function. Functions are reusable blocks of code.

Consider the following example. We are writing an application for handling tournaments and want to compute player ratings using the ELO rating system. This system works by first calculating the probabilities of the players winning (i.e. the expected scores) and then updating the player’s ratings from those probabilities and their old ratings.

Consider an example with two players, where the first player has the rating 1000 and the second player has the rating 1500. Let’s store the rating of the first player in a variable rating1 and the rating of the second player in a variable rating2:

rating1 = 1000
rating2 = 1500

We can calculate the expected score of the first player using this formula:

expected1 = 1 / (1 + 10 ** ((rating2 - rating1) / 400))
expected1
0.05324021520202244

Don’t spend too much time worrying about the details of the formula. Basically, the higher the difference between rating2 and rating1, the lower the expected score of the first player should be. After all a high value of rating2 - rating1 means that the first player has a substantially lower rating than the second player, i.e. we don’t expect the first player to achieve a high score.

Additionally, the score is normalized to be between 0 and 1. An expected score of 0 means that the first player has no chance of winning. An expected score of 1 means that the first player will definitely win.

In this particular case we see that the expected score of the first player is very small (around 0.05), which makes sense since the first player has a much lower rating than the second player.

The expected score of the second player can be calculated in a similar manner:

expected2 = 1 / (1 + 10 ** ((rating1 - rating2) / 400))
expected2
0.9467597847979775

This expected score is very high, which again makes sense as the second player has a much higher rating than the first player.

Let’s say that the first player surprises everyone and wins the game. In that case the rating update can be performed like this:

new_rating1 = rating1 + 32 * (1 - expected1)
new_rating2 = rating2 + 32 * (0 - expected2)

Put simply, the rating increases by an amount proportional to the expected score for the winner and decreases by an amount proportional to the expected score for the loser.

Inspecting the new rating gives us the following result:

new_rating1
1030.2963131135352
new_rating2
1469.7036868864648

We can see that the rating of the first player has increased quite substantially since he won against a stronger player. Meanwhile the rating of the second player has decreased (also quite substantially) since he lost against a weaker player.

Let’s find out what happens if the first player loses the game:

rating1 = 1000
rating2 = 1500

new_rating1 = rating1 + 32 * (0 - expected1)
new_rating2 = rating2 + 32 * (1 - expected2)
new_rating1
998.2963131135352
new_rating2
1501.7036868864648

This makes sense, too. If the second player wins, his rating increases. However the amount of the rating increase is very small, because the second player is much stronger than the first player according to the ratings and so his win was expected.

We can already see how duplicate code is creeping into our application. After all, this calculation would have to be done quite often in our tournament application, which would result in a lot of tedious repetition of code.

This leads to several problems.

First of all, we would have to repeat the exact same code over and over again, which costs time and with each repetition we become more likely to introduce a mistake somewhere.

Second, if we want to change the rating calculation at some point, we would need to adjust it in every place the calculation happens. This, again, costs time and if we forget to do it somewhere, then some of the code will still be using old calculation, which will result in bugs.

Instead of going through all this tedium, we can define the calculation once inside a function. Then every time we need to perform the calculation, we will simply use the function.

This fixes both our problems.

First, from now on we only need to write the code once (namely inside the function).

Second, if we want to change the rating calculation, we only need to update the function - all code that uses the function will automatically use the new rating calculation.

Defining and Calling Functions#

We define a function using the def keyword followed by the function name and parentheses (). This should be followed by a function body which contains the function implementation (i.e. the code that we want to store inside the function).

For example, here is a function that prints a greeting. This function is not super useful, but it will serve to illustrate a few important concepts:

# Function definition below
# vvvvvvvvvvvvvvvvvvvvvvvvv

def print_greeting():
    # Function body
    print("Hello, user")

Here is how this function would look like in a real codebase (i.e. without the distracting comments):

def print_greeting():
    print("Hello, user")

Do note that the print statement must be indented using either 4 spaces or a single tab. If you don’t indent it this way, you will get an error:

def print_greeting():
print("Hello, user")
  Cell In[14], line 2
    print("Hello, user")
    ^
IndentationError: expected an indented block

PEP8 note: Code should always be indented using 4 spaces. However, when working in a REPL it’s common practice to indent code using tabs, because you need to type less characters that way. In addition, you can configure most code editors to automatically insert 4 spaces when you press the tab key.

If the function body has multiple statements, you must indent them all:

def print_greetings():
    print("Hello, user")
    print("Hello, other user")

You can’t mix spaces and tabs when indenting multiple statements. For example, in this function definition we indent the first line using 4 spaces and the second line using a tab:

def print_greetings():
    print("Hello, user")
	print("Hello, other user")
  Cell In[16], line 3
    print("Hello, other user")
    ^
TabError: inconsistent use of tabs and spaces in indentation

We can execute the function body with a function call. To call a function we write down the function name followed by parentheses ():

print_greeting()
Hello, user
print_greetings()
Hello, user
Hello, other user

Hooray! We just wrote our very first function! Go tell all your friends about it!

To make our functions more useful, we should give them parameters that we can use to customize their behaviour.

Function Parameters and Arguments#

Functions can take parameters between the parentheses which allow us to pass additional values to the function in a function call. These parameters may then be used inside the function body just like regular variables.

Consider a function that should print a greeting which contains a name. We would write it like this:

def print_greeting(name):
    print(f"Hello {name}")

Here name is a parameter.

When calling a function with parameters, we must pass arguments to the function which are then assigned to the parameters. Since print_greeting has one parameter, we must call it with one argument:

print_greeting("John")
Hello John

Here the string "John" is an argument which is assigned to the parameter name.

If we try to call print_greeting without the argument, we will get the following error:

print_greeting()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 print_greeting()

TypeError: print_greeting() missing 1 required positional argument: 'name'

We could also write the function call like this:

print_greeting(name="John")
Hello John

A function can take multiple parameters, in which case they are separated by a comma:

def print_complex_greeting(first_name, last_name):
    print(f"Hello {first_name} {last_name}")

If a function takes multiple parameters, we need to pass the corresponding number of arguments in a function call. The arguments must also be separated by commas:

print_complex_greeting("John", "Doe")
Hello John Doe

The number of arguments must be exactly equal to the number of parameters. Passing more or less arguments won’t do. For example, if we pass too few arguments, Python will calmly yell at us that we are missing arguments:

print_complex_greeting("John")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 print_complex_greeting("John")

TypeError: print_complex_greeting() missing 1 required positional argument: 'last_name'

If we pass too many arguments, Python will also inform us that we passed too many arguments:

print_complex_greeting("John", "Michael", "Doe")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[26], line 1
----> 1 print_complex_greeting("John", "Michael", "Doe")

TypeError: print_complex_greeting() takes 2 positional arguments but 3 were given

An important takeaway from this discussion is that you should always carefully read error messages. Unlike some other languages, Python’s error messages are generally quite helpful.

Upon careful reading of the errors, we can see that Python calls the arguments we pass positional arguments because the parameters they will be assigned to are determined from their position. Since "John" appears as the first argument in the call, it will be assigned to the first parameter (first_name). "Doe" appears as the second argument in the call and will therefore be assigned to the second parameter (last_name).

Alternatively we could write the above function call like this:

print_complex_greeting(first_name="John", last_name="Doe")
Hello John Doe

Now these arguments are keyword arguments because the parameters they will be assigned to are determined from the argument names. This means that "John" will be assigned to first_name and "Doe" will be assigned to last_name.

If we use keyword arguments, the order of the arguments plays no role. For example, this function call is equivalent to the preceding function call:

print_complex_greeting(last_name="Doe", first_name="John")
Hello John Doe

If we use positional arguments, the order of arguments does play a role. For example this function call is not equivalent to the previous function calls:

print_complex_greeting("Doe", "John")
Hello Doe John

Positional arguments and keyword arguments can be mixed:

print_complex_greeting("John", last_name="Doe")
Hello John Doe

You must be careful when mixing positional and keyword arguments. For example, the following function call is not valid:

print_complex_greeting("Doe", first_name="John")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[31], line 1
----> 1 print_complex_greeting("Doe", first_name="John")

TypeError: print_complex_greeting() got multiple values for argument 'first_name'

This is because "Doe" will be assigned to first_name since it’s the first positional argument and "John" will also be assigned to first_name, because it’s a keyword argument called first_name.

The following is also invalid, because Python does not allow positional arguments after keyword arguments (which wouldn’t make a whole lot of sense anyway):

print_complex_greeting(first_name="John", "Doe")
  Cell In[32], line 1
    print_complex_greeting(first_name="John", "Doe")
                                                   ^
SyntaxError: positional argument follows keyword argument

The return Keyword#

At the moment we are printing values inside our functions. But usually we would like the function to compute a value and hand it to us. We can accomplish this using the return keyword.

Let’s rewrite the print_greeting function to a get_greeting function that returns the greeting instead of printing it.

def get_greeting(name):
    return f"Hello {name}"
greeting = get_greeting("John")
greeting
'Hello John'

Something that often trips up beginners is that they confuse print and return. However these are two completely different and unrelated things. The first one - print - is a function which outputs a value to the console. The second one - return - is a keyword which allows a function to return a value to the code that calls it.

Therefore print_greeting and get_greeting are two completely different functions. The function print_greeting prints the greeting and doesn’t return anything useful. For example if we try to assign a value to the result of print_greeting, this happens:

another_greeting = print_greeting("John")
Hello John

The print_greeting function prints "Hello, John". Now let’s inspect another_greeting:

print(another_greeting)
None

We see that another_greeting has the special value None which essentially means “nothing” in Python. This is because the print_greeting function didn’t have the return keyword.

Contrast this with the behaviour of the get_greeting function which doesn’t print anything when we call it:

greeting = get_greeting("John")

However unlike another_greeting, the variable greeting does have a value:

greeting
'Hello John'

This is the point at which we note that returning a value from a function is far more common than printing something. After all, if we print a value, we can’t do anything useful with the output later on. However, if we return a value, we can assign that value to a variable and manipulate it further. Here is a pattern you will see quite often:

def get_full_name(first_name, last_name):
    full_name = first_name + " " + last_name
    return full_name

def get_greeting(full_name):
    return f"Hello, {full_name}"


first_name = "John"
last_name = "Doe"

# Get the full name and store it in a variable full_name
full_name = get_full_name(first_name, last_name)

# Use the variable full_name to obtain a greeting 
greeting = get_greeting(full_name)
greeting
'Hello, John Doe'

However, if we would have printed full_name instead of returning it we wouldn’t have been able to pass it to get_greeting.

Writing a Complex Function#

Armed with our knowledge, we can return to the example that motivated this chapter.

Here is how we would write a function that computes the expected scores for two players given their ratings.

def get_expected_scores(rating1, rating2):
    expected1 = 1 / (1 + 10 ** ((rating2 - rating1) / 400))
    expected2 = 1 / (1 + 10 ** ((rating1 - rating2) / 400))

    return expected1, expected2

Note that we can return multiple values from a function by separating them with commas. Now we can use the get_expected_scores function to obtain the new ratings of the players:

def get_new_ratings(rating_winner, rating_loser):
    expected_winner, expected_loser = get_expected_scores(rating_winner, rating_loser)
    new_rating_winner = rating_winner + 32 * (1 - expected_winner)
    new_rating_loser = rating_loser + 32 * (0 - expected_loser)
    return new_rating_winner, new_rating_loser

Pay attention to how get_new_ratings calls get_expected_scores which improves readability. Generally it’s a good idea to create functions that are as small as possible and split subtasks into helper functions.

Let’s check if our new functions work the way they should. Because get_new_ratings returns multiple values, we have to assign the result to multiple values when calling it:

new_winner_rating, new_loser_rating = get_new_ratings(1000, 1500)
new_winner_rating
1030.2963131135352
new_loser_rating
1469.7036868864648

Let’s take a moment to point out that our functions, parameters and variables have sensible names like get_new_ratings or new_rating2. This makes them readable. Generally you should avoid writing functions that look like this:

def flunkify(flunky1, flunky2):
    flunkified_flunky1 = 1 / (1 + 10 ** ((flunky2 - flunky1) / 400))
    flunkified_flunky2 = 1 / (1 + 10 ** ((flunky1 - flunky2) / 400))

    return flunkified_flunky1, flunkified_flunky2

While the flunkify function technically does the same thing as the get_expected_scores function the flunkify function contains terrible code, because it is not readable. Code that is not readable will generally lead for headaches for your fellow developers which will quickly lead to headaches for you personally.

Therefore you should always use sensible names for your variables, parameters and functions.

Docstrings#

To further improve the readability of our functions, it’s common practice to document them using so called documentation strings (or docstrings for short).

This looks like this:

def get_expected_scores(rating1, rating2):
    """
    Calculate the expected scores of two players given their ratings.
    
    The expected scores are calculated using the standard ELO formula
    E = 1 / (1 + 10 ** ((R_A - R_B) / 400)).
    R_A and R_B are the ratings of the players respectively.
    """
    expected1 = 1 / (1 + 10 ** ((rating2 - rating1) / 400))
    expected2 = 1 / (1 + 10 ** ((rating1 - rating2) / 400))

    return expected1, expected2

Docstrings should contain a brief description of what the function does, as well as additional information and a description of potential edge cases.

You should write docstrings for all functions that will be used by other developers in a codebase or if you are writing a package that is intended to be used by other people.

Useful Built-in Functions#

There is a number of useful built-in functions.

In fact, you already encountered two of them - namely print and type:

print(42)
42
type(42)
int

However, print and type are not the only useful built-in functions.

For example you can use the int function (not to be confused with the int data type) to convert a value to an integer:

int(42.2)
42
int("42")
42

You can also use the float function (not to be confused with the float data type) to convert a value to a floating point number:

float("42.2")
42.2

You can use min and max to get the minimum or maximum of some values, respectively:

min(42, 43, 44)
42
max(42, 43, 44)
44

The abs function provides the absolute value of an integer or a float:

abs(-32)
32
abs(-38.7)
38.7

The round function allows you to round a value to the specified number of digits:

round(12.525, 1)
12.5

These were just a few examples - there are many more useful built-in functions.