Chapter 4. Classes

Chapter 4. Classes#

Objects and Classes#

An object is a data structure that contains state and behaviour.

Before we can create objects, we (usually) need to define a class which will be our template for creating objects. The objects which we create from that template are called instances of the class.

Consider the following example:

We have a ball which has a position and a speed. The position and the speed describe the state of the ball. The ball can also move, i.e. update its position based on the speed. This is the behaviour of the ball.

We can define a class Ball (which will be the template for creating ball objects in the future) like this:

class Ball:
    def __init__(self, pos, speed):
        self.pos = pos
        self.speed = speed

Here the state consists of the variables pos and speed. Generally speaking, variables that are associated with objects (and represent its state) are called attributes.

PEP8 note: Class names should normally use the CapWords convention, e.g. Ball or MyAwesomeClass.

The __init__ function (commonly referred to as constructor) is a special function which initializes an object with some initial state.

For example if we want to create a ball with the position 10 and the speed 8, we write:

my_ball = Ball(10, 8)

Just like with functions, we can pass positional arguments and/or keyword arguments to the constructor:

my_ball = Ball(pos=10, speed=8)

Let us inspect the attributes of the ball object:

f"pos = {my_ball.pos}, speed = {my_ball.speed}"

'pos = 10, speed = 8'

We can also get the class of the object using the type function:

type(my_ball)

__main__.Ball

We can also confirm that the my_ball object is an instance of the Ball class using the isinstance function. This will come in handy later:

isinstance(my_ball, Ball)

True

We could now change the state of the ball directly. For example this is how we could move the ball:

t = 2
my_ball.pos += my_ball.speed * t

f"pos = {my_ball.pos}, speed = {my_ball.speed}"

'pos = 26, speed = 8'

But this isn’t really the way to go. What if we wanted to change the speed calculation later (for example taking friction into account)? Then we would need to adjust the calculation in every single place we use it, which is a lot of work and could introduce bugs if we forget the calculation adjustment somewhere.

If you paid attention in the previous chapter, you will quickly realize that we have the exact same problem that motivated the introduction of functions.

After we introduced functions, we stored the repetitive calculations in the function body and then simply called the function when we needed those calculations. Luckily, classes and objects provide a similar mechanism.

We can define functions on classes - these functions are called methods and define the behaviour of the objects of that class. The first parameter of every method is an instance of the current object. By convention this parameter is called self.

For example, here is how we could implement a move method on the Ball class:

class Ball:
    def __init__(self, pos, speed):
        self.pos = pos
        self.speed = speed
        
    def move(self, t):
        self.pos += self.speed * t

Here, self will be the ball object the method is invoked on. For example if we call move on the object my_ball then self will be my_ball. If we call move on the object my_other_ball then self will be my_other_ball.

If you are working in the REPL, after creating the new class you need to recreate the my_ball object, otherwise you will still be using the old class!

my_ball = Ball(10, 8)
my_ball.move(2)

f"pos = {my_ball.pos}, speed = {my_ball.speed}"

'pos = 26, speed = 8'

If you see the error message AttributeError: 'Ball' object has no attribute 'move' this means that you are still using the old ball class and you need to recreate my_ball.

Note how executing the methods that defined the behaviour changed the object state. This is what objects are all about. They are initialized with some state and then we can use their behaviour to update the state.

We should also point out that apart from having to write classes yourself, you will use them all the time. In fact, we will introduce many extremely important classes in the very next chapter.

The `str` and `repr` Methods#

Quite often, we want to obtain a string representation of an object (e.g. using the print function) that shows its current state. Unfortunately, at the moment, if we try to print or get the representation of an object, we will receive a string that’s pretty useless.

my_ball

<__main__.Ball at 0x7fb37c76e6d0>

print(my_ball)

<__main__.Ball object at 0x7fb37c76e6d0>

The weird string at the end of the output (yours will probably be different) is the memory address of the object represented as a hexadecimal number (because we use the CPython runtime).

Luckily, Python allows us to change this behaviour. Classes may implement two special methods called __str__ and __repr__.

The __str__ method produces a human-readable string for consumption by the end user. The __repr__ method produces an unambiguous representation of the object. Basically __str__ is intended to produce an “informal” representation of an object, while __repr__ is intended to produce a formal representation of an object.

This is reflected in the circumstances under which these methods are called. For example if you just output an object in the REPL, you will see the output of a call to the __repr__ method of that object. If you print an object, you will see the result of a call to the __str__ method of that object.

This makes sense - in the REPL we generally want to see an unambiguous representation of our objects. On the other hand, when we call print we usually want a human-readable string that displays the information we primarily care about.

These two methods often have the same implementation (since the end user often wants to see the unambiguous representation), but this doesn’t have to be the case. For example, we could argue, that the end user doesn’t care about the speed of the ball and only cares about its current position. Then the __str__ method would return a string that doesn’t contain the speed attribute. However the __repr__ method should still return all attributes:

class Ball:
    def __init__(self, pos, speed):
        self.pos = pos
        self.speed = speed
        
    def move(self, t):
        self.pos += self.speed * t
        
    def __str__(self):
        return f"Ball(pos={self.pos})"
    
    def __repr__(self):
        return f"Ball(pos={self.pos}, speed={self.speed})"

ball = Ball(pos=10, speed=8)

# Note how printing the ball gives us
# the output of __str__

print(ball)

Ball(pos=10)

# However just outputting the ball in a REPL
# gives us the output of __repr__

ball

Ball(pos=10, speed=8)

We can also call __str__ and __repr__ manually by calling the str and repr functions:

str(ball)

'Ball(pos=10)'

repr(ball)

'Ball(pos=10, speed=8)'

If __str__ and __repr__ should return the same string, it’s enough to just define __repr__. If no __str__ is defined and you try to call it, Python will fallback to calling __repr__:

class Ball:
    def __init__(self, pos, speed):
        self.pos = pos
        self.speed = speed
        
    def move(self, t):
        self.pos += self.speed * t
    
    def __repr__(self):
        return f"Ball(pos={self.pos}, speed={self.speed})"

ball = Ball(pos=10, speed=8)

ball

Ball(pos=10, speed=8)

print(ball)

Ball(pos=10, speed=8)

Writing __repr__ for practically all your classes should be second nature for you from now on. Being able to see the state of an object when outputting it is absolutely vital when trying to spot bugs and attempting to understand what’s going on in your application.

Therefore whenever you define a class, add a __repr__ method which outputs a string that unambiguously represents the object it was called on. The best way to go about this is to return a string whose value represents the instantiation of a new object.

For instance, in the above example __repr__ returned a string with the value Ball(pos=10, speed=8). If we copy that value into our codebase, we would create a new Ball object with the exact same state as the object that produced this string:

Ball(pos=10, speed=8)

Ball(pos=10, speed=8)

Object Identity#

Let’s create two instances of the class Ball whose attributes have the same values:

ball1 = Ball(10, 8)
ball2 = Ball(10, 8)

ball1

Ball(pos=10, speed=8)

ball2

Ball(pos=10, speed=8)

Note that all the attributes of these two objects have the same values, but they are totally different objects:

This means that if we change the values of ball1, then ball2 will be completely unaffected:

ball1.pos += ball1.speed * 2

The value of ball1.pos has now changed:

ball1

Ball(pos=26, speed=8)

However the value of ball2.pos is exactly the same as before:

ball2

Ball(pos=10, speed=8)

But what happens if we write this?

ball1 = Ball(10, 8)
ball2 = ball1

Now we have a completely different situation. The symbolic names ball1 and ball2 are still two different names, but they refer to the exact same object:

This means that whenever we make a change to that object using the name ball1, that change will be visible when we access the object using the name ball2. Consider this statement:

ball1.pos += ball1.speed

Outputting ball1 obviously shows the new value of pos:

ball1

Ball(pos=18, speed=8)

However, outputting ball2 also shows the new value of pos!

ball2

Ball(pos=18, speed=8)

We say that ball1 is ball2. These two names refer to the exact same object. Put differently, ball1 and ball2 have the same identity.

The is operator can be used to check if two names represent the same object:

ball1 is ball2

True

We can also see that ball1 and ball2 are the same object using the id function. Generally speaking, this function hands us a number that is unique for each object (for the lifetime of that object). In the Python interpreter we downloaded in chapter 1 (which is the CPython interpreter) this is achieved by returning the memory address of the object.

In this case ball1 and ball2 point to the same object, so their memory addresses are the same:

id(ball1)

140408864040944

id(ball2)

140408864040944

id(ball1) == id(ball2)

True

However, if we have two names that refer to different objects, then their memory addresses will not be the same and the is operator will return False even if the object attributes are all equal:

ball1 = Ball(10, 8)
ball2 = Ball(10, 8)

id(ball1)

140408864041232

id(ball2)

140408864040320

id(ball1) == id(ball2)

False

ball1 is ball2

False

Object Equality#

An interesting question is what happens if we use the equality operator == on objects. By default, the equality operator does the same thing as the is operator:

ball1 = Ball(10, 8)
ball2 = Ball(10, 8)

ball1 == ball2

False

ball1 = Ball(10, 8)
ball2 = ball1

ball1 == ball2

True

This is not particularly useful. After all, if two balls have the exact same position and the exact same speed, we would probably want them to be equal (even though the is operator returns False).

Luckily, Python gives us a way to achieve this by overriding (i.e. providing a custom implementation) for the equality operator. In order to do this, we need to write a custom __eq__ method which takes self and the other object we want to compare this object to:

class Ball:
    def __init__(self, pos, speed):
        self.pos = pos
        self.speed = speed

    # some methods
    
    def __eq__(self, other):
        return isinstance(other, Ball) and self.pos == other.pos and self.speed == other.speed

    def __repr__(self):
        return f"Ball(pos={self.pos}, speed={self.speed})"

Our __eq__ method checks that the other object is an instance of Ball and whether all attributes of self and other are equal. If they are, we return True, otherwise we return False.

Now the == operator works the way we would like it to:

ball1 = Ball(10, 8)
ball2 = Ball(10, 8)

ball1 == ball2

True

Note that how and if you want to override == depends on your needs. For example it might actually be the case, that two balls shouldn’t compare equal, even if their attributes have the same values.

But generally speaking, if you override the equality operator, you should override it by checking that the two objects it compares are instances of the same class and their attributes have the same values.