4.2. Collections#

4.2.1. Ordered Collections#

4.2.1.1. Lists#

A Python list is an ordered collection of items.

We can create lists using the following syntax

[item1, item2, ...,  itemN]

where the ... represents any number of additional items.

Each item can be of any type.

Let’s create some lists.

# created, but not assigned to a variable
[2.0, 9.1, 12.5]
[2.0, 9.1, 12.5]
# stored as the variable `x`
x = [2.0, 9.1, 12.5]
print("x has type", type(x))
x
x has type <class 'list'>
[2.0, 9.1, 12.5]

4.2.1.1.1. What Can We Do with Lists?#

We can access items in a list called mylist using mylist[N] where N is an integer.

Note: Anytime that we use the syntax x[i] we are doing what is called indexing – it means that we are selecting a particular element of a collection x.

x[1]
9.1

Wait? Why did x[1] return 9.1 when the first element in x is actually 2.0?

This happened because Python starts counting at zero!

Lets repeat that one more time for emphasis Python starts counting at zero!

To access the first element of x we must use x[0]:

x[0]
2.0

We can also determine how many items are in a list using the len function.

len(x)
3

What happens if we try to index with a number higher than the number of items in a list?

# uncomment the line below and run
# x[4]

We can check if a list contains an element using the in keyword.

2.0 in x
True
1.5 in x
False

For our list x, other common operations we might want to do are…

number_list = [10, 25, 42, 1.0]
print(number_list)
number_list.sort()
print(number_list)
[10, 25, 42, 1.0]
[1.0, 10, 25, 42]

Note that in order to sort, we had to have all elements in our list be numbers (int and float), more on this below.

We could actually do the same with a list of strings. In this case, sort will put the items in alphabetical order.

str_list = ["NY", "AZ", "TX"]
print(str_list)
str_list.sort()
print(str_list)
['NY', 'AZ', 'TX']
['AZ', 'NY', 'TX']

The append method adds an element to the end of existing list.

num_list = [10, 25, 42, 8]
print(num_list)
num_list.append(10)
print(num_list)
[10, 25, 42, 8]
[10, 25, 42, 8, 10]

4.2.1.2. Lists of Different Types#

While most examples above have all used a list with a single type of variable, this is not required.

Let’s carefully make a small change to the first example: replace 2.0 with 2

x = [2, 9.1, 12.5]

This behavior is identical for many operations you might apply to a list.

import numpy as np
x = [2, 9.1, 12.5]
np.mean(x) == sum(x)/len(x)
True

Here we have also introduced a new module, Numpy, which provides many functions for working with numeric data.

Taking this further, we can put completely different types of elements inside of a list.

# stored as the variable `x`
x = [2, "hello", 3.0]
print("x has type", type(x))
x
x has type <class 'list'>
[2, 'hello', 3.0]

While no programming limitations prevent this, you should be careful if you write code with different numeric and non-numeric types in the same list.

For example, if the types within the list cannot be compared, then how could you sort the elements of the list? (i.e. How do you determine whether the string “hello” is less than the integer 2, “hello” < 2?)

x = [2, "hello", 3.0]
# uncomment the line below and see what happens!
# x.sort()

A few key exceptions to this general rule are:

  • Lists with both integers and floating points are less error-prone (since mathematical code using the list would work with both types).

  • When working with lists and data, you may want to represent missing values with a different type than the existing values.

4.2.1.3. The range Function#

One function you will see often in Python is the range function.

It has three versions:

  1. range(N): goes from 0 to N-1

  2. range(a, N): goes from a to N-1

  3. range(a, N, d): goes from a to N-1, counting by d

When we call the range function, we get back something that has type range:

r = range(5)
print("type(r)", type(r))
type(r) <class 'range'>

To turn the range into a list:

list(r)
[0, 1, 2, 3, 4]

4.2.1.4. What are Tuples?#

Tuples are very similar to lists and hold ordered collections of items.

However, tuples and lists have three main differences:

  1. Tuples are created using parenthesis — ( and ) — instead of square brackets — [ and ].

  2. Tuples are immutable, which is a fancy computer science word meaning that they can’t be changed or altered after they are created.

  3. Tuples and multiple return values from functions are tightly connected, as we will see in functions.

t = (1, "hello", 3.0)
print("t is a", type(t))
t
t is a <class 'tuple'>
(1, 'hello', 3.0)

We can convert a list to a tuple by calling the tuple function on a list.

print("x is a", type(x))
print("tuple(x) is a", type(tuple(x)))
tuple(x)
x is a <class 'list'>
tuple(x) is a <class 'tuple'>
(2, 'hello', 3.0)

We can also convert a tuple to a list using the list function.

list(t)
[1, 'hello', 3.0]

As with a list, we access items in a tuple t using t[N] where N is an int.

t[0]  # still start counting at 0
1
t[2]
3.0

Tuples (and lists) can be unpacked directly into variables.

x, y = (1, "test")
print(f"x = {x}, y = {y}")
x = 1, y = test

4.2.1.5. List vs Tuple: Which to Use?#

Should you use a list or tuple?

This depends on what you are storing, whether you might need to reorder the elements, or whether you’d add new elements without a complete reinterpretation of the underlying data.

In general, a rule of thumb is to use a list unless you need to use a tuple.

Key criteria for tuple use are when you want to:

  • ensure the order of elements can’t change

  • ensure the actual values of the elements can’t change

  • use the collection as a key in a dict

For example, take data representing the GDP (in trillions) and population (in billions) for China in 2015.

china_data_2015 = ("China", 2015, 11.06, 1.371)

print(china_data_2015)
('China', 2015, 11.06, 1.371)

In this case, we have used a tuple since: (a) ordering would be meaningless; and (b) adding more data would require a reinterpretation of the whole data structure.

On the other hand, consider a list of GDP in China between 2013 and 2015.

gdp_data = [9.607, 10.48, 11.06]
print(gdp_data)
[9.607, 10.48, 11.06]

In this case, we have used a list, since adding on a new element to the end of the list for GDP in 2016 would make complete sense.

Along these lines, collecting data on China for different years may make sense as a list of tuples (e.g. year, GDP, and population – although we will see better ways to store this sort of data in the Pandas section).

china_data = [(2015, 11.06, 1.371), (2014, 10.48, 1.364), (2013, 9.607, 1.357)]
print(china_data)
[(2015, 11.06, 1.371), (2014, 10.48, 1.364), (2013, 9.607, 1.357)]

4.2.2. Associative Collections#

4.2.2.1. Dictionaries#

A dictionary (or dict) associates keys with values.

It will feel similar to a dictionary for words, where the keys are words and the values are the associated definitions.

The most common way to create a dict is to use curly braces — { and } — like this:

{"key1": value1, "key2": value2, ..., "keyN": valueN}

where the ... indicates that we can have any number of additional terms.

The crucial part of the syntax is that each key-value pair is written key: value and that these pairs are separated by commas — ,.

Let’s see an example using our aggregate data on China in 2015.

china_data = {"country": "China", "year": 2015, "GDP" : 11.06, "population": 1.371}
print(china_data)
{'country': 'China', 'year': 2015, 'GDP': 11.06, 'population': 1.371}

Unlike our above example using a tuple, a dict allows us to associate a name with each field, rather than having to remember the order within the tuple.

Often, code that makes a dict is easier to read if we put each key: value pair on its own line. (Recall our earlier comment on using whitespace effectively to improve readability!)

The code below is equivalent to what we saw above.

china_data = {
    "country": "China",
    "year": 2015,
    "GDP" : 11.06,
    "population": 1.371
}

Most often, the keys (e.g. “country”, “year”, “GDP”, and “population”) will be strings, but we could also use numbers (int, or float) or even tuples (or, rarely, a combination of types).

The values can be any type and different from each other.

This next example is meant to emphasize how values can be anything – including another dictionary.