As much as I love Python, there are two things about it I’ve seen drive programmers experienced in other languages absolutely nuts. The first is the syntax. Instead of using curly braces to delimit blocks of code, Python uses indentation. The second is how variables are handled.
But wait! Aren’t variables like boxes with names, that we can put data into using the names?
Well, yes, and barring pointer operations in C and its derivative languages, that mental model often works perfectly well. For most languages.
So what makes Python different? First, the fact that everything is an object. Even basic datatypes like integers.
Second, variables in Python, like PHP, are dynamic. The type of data is not predetermined when you first create the variable name, or “identifier”. Because of this, you will never see a variable declaration specifying a type as in Java or C like this:
int car = 3;
As a result, the following commands are perfectly valid:
car = 3
car = “Ferrari”
Third, and most importantly, the variable identifier is just a reference to the data. Any data. Even integers. For example, if I create the variables “car” and “pet”, and set them both to 3, they both point to an integer object with the value of 3. If I set the variable “joe” equal to “car”, it doesn’t make a copy of “joe” – it just points “joe” to the same object as “car”:
For immutable data types like numbers and strings, the behavior is effectively the same as you’re used to in any other language, because if you change the value that a variable points to, it just gets redirected to the new value. For example:
Where things get interesting though, is when dealing with mutable data, like lists.
Try this out:
L1 = [2,3,4]
L2 = L1
L1[0] = 24
In the first two lines, you create a list, and set the second variable, L2, equal to the first. Since we are just passing pointers, then L2 actually ends up pointing to the very same list as L1.
So when, in the third line, we set the first item in the list L1 to 24.
Since both L1 and L2 point to the same list, the first value in BOTH lists is now 24.
Why was it chosen to make Python behave like this? I haven’t found the specific answer in either the python.org sites, or at Guido Rossum’s blog detailing the history and philosophy behind it, but It’s almost certainly a part of trying to make everything an object, while allowing dynamic variable types, and reducing the memory overhead as much as possible while also keeping you from having to deal with memory management.
One of the most useful consequences of these design choices is the ridiculous amount of flexibility in how you pass around and manipulate data. Since everything is an “object” – you are not restricted to “integers” and ‘floats.” Specifically, I’m talking about “duck typing” – named for the expression “If it walks like a duck, and talks like a duck.”. In the case of Python, (or id-type objects in Objective-C), what this means is that not only can you assign any type of data to a variable as we already discussed, but you can try to run any method on it you wish as long as the object supports the method.
This is fundamental. Unlike, for example, Java – where polymorphism is limited to methods of a class or superclass from which the class inherits – Python doesn’t care what type the class is – it just cares if the method exists. It doesn’t matter whether it’s a duck, a rectangle, or a car, it just matters if it responds to the method “quack()”.
So how do we deal with this?
Well, if you’re just passing the variable into a function to be read, it doesn’t matter. If, on the other hand, you want to actually make a copy, or create a new, modified data set without changing the original, you need to make a copy.
Even with copies we run into issues. Lists are objects just like anything else. What happens if the list you are copying contains lists? A simple copy of the top-level list makes a separate set of references for anything in the list, so now both lists seem to be entirely separate. Yet, if the list we are copying has sublists, our copy still contains shared pointers to those sublists, with all of the problems we saw above.
So first, unless you, the programmer explicitly ask it to, Python only passes references. This is never going to cause trouble when dealing with immutable data types.
Secondly, if you want to make a separate copy of a list or mutable datatype, you have to explicitly copy it by using the “copy” method, or a slice like this:
aList = [0,1,2,3]
bList = a[:] # shallow copy of a using a slice
or
import copy #import the copy module
aList = [0,1,2,3]
bList = copy.copy(aList) # shallow copy using copy module
Third – if you have sublists or other mutable data types within a mutable data type, you need to make “deep” copies. Deep copies allow you to make separate copies of any sublists, dictionaries, or other mutable data that would cause issues in a shallow copy. They require you to either code your own method, or again, you can import the copy module, and use the deepcopy method.
import copy #import the copy module
aList = [0,1,2,3, [“a”, “b”, “c”] ] # list with sublist
bList = copy.deepcopy(aList) # deep copy using copy module
Hopefully this helps someone out there trying to get their heads wrapped around Python variable handling.