Exercise 2.6

Objectives:

  • Explore the power of having first-class objects.

Files Created: None.

Files Modified: None.

(a) First-class Data

In the file Data/portfolio.csv, we read data organized as columns that look like this:

name,shares,price
"AA",100,32.20
"IBM",50,91.10
...

In previous code, we used the csv module to read the file, but still had to perform manual type conversions. For example:

for row in f_csv:
    name   = row[0]
    shares = int(row[1])
    price  = float(row[2])

This kind of conversion can also be performed in a more clever manner using some list basic operations. Make a Python list that contains the names of the conversion functions you would use to convert each column into the appropriate type:

>>> coltypes = [str, int, float]
>>>

The reason you can even create this list is that everything in Python is "first-class." So, if you want to have a list of functions, that’s fine. The items in the list you created are functions for converting a value x into a given type (e.g., str(x), int(x), float(x)).

Now, read a row of data from the above file:

>>> import csv
>>> f = open('Data/portfolio.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> row = next(f_csv)
>>> row
['AA', '100', '32.20']
>>>

As noted, this row isn’t enough to do calculations because the types are wrong. For example:

>>> row[1] * row[2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'str'
>>>

However, maybe the data can be paired up with the types you specified in coltypes. For example:

>>> coltypes[1]
<type 'int'>
>>> row[1]
'100'
>>>

Try converting one of the values:

>>> coltypes[1](row[1])     # Same as int(row[1])
100
>>>

Try converting a different value:

>>> coltypes[2](row[2])     # Same as float(row[2])
32.2
>>>

Zip the column types with the fields and look at the result:

>>> r = zip(coltypes,row)
>>> r
[(<type 'str'>, 'AA'), (<type 'int'>, '100'), (<type 'float'>,'32.20')]
>>>

You will notice that this has paired a type conversion with a value. For example, int is paired with the value '100'. The zipped list is useful if you want to perform conversions on all of the values, one after the other. Try this:

>>> converted = []
>>> for func, val in zip(coltypes, row):
         converted.append(func(val))
...
>>> converted
['AA', 100, 32.2]
>>>

Make sure you understand what’s happening in the above code. In the loop, the func variable is one of the type conversion functions (e.g., str, int, etc.) and the val variable is one of the values like 'AA', '100'. The expression func(val) is simply converting a value (kind of like a type cast).

The above code can be compressed into a single list comprehension. Try this:

>>> converted = [func(val) for func,val in zip(coltypes,row)]
>>> converted
['AA', 100, 32.2]
>>>

(b) Making dictionaries

Remember how the dict() function can easily make a dictionary if you have a sequence of key names and values? Let’s make a dictionary from the column headers:

>>> headers
['name', 'shares', 'price']
>>> converted
['AA', 100, 32.2]
>>> dict(zip(headers, converted))
{'price': 32.2, 'name': 'AA', 'shares': 100}
>>>

Of course, if you’re up on your list-comprehension fu, you can do the whole conversion in a single shot using a dict-comprehension:

>>> { name: func(val) for name, func, val in zip(headers, coltypes, row) }
{'price': 32.2, 'name': 'AA', 'shares': 100}
>>>

(c) The Big Picture

Using the techniques in this exercise, you could write statements that easily convert fields from just about any column-oriented datafile into a Python dictionary. Just to illustrate, suppose you read data from a different datafile like this:

>>> f = open('Data/dowstocks.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> row = next(f_csv)
>>> headers
['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume']
>>> row
['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '39.67', '39.69', '39.45', '181800']
>>>

Let’s convert the fields using a similar trick:

>>> coltypes = [str,float,str,str,float,float,float,float,int]
>>> converted = [func(val) for func,val in zip(coltypes, row)]
>>> record = dict(zip(headers, converted))
>>> record
{'volume': 181800, 'name': 'AA', 'price': 39.48, 'high': 39.69,
'low': 39.45, 'time': '9:36am', 'date': '6/11/2007', 'open': 39.67,
'change': -0.18}
>>> record['name']
'AA'
>>> record['price']
39.48
>>>

Spend some time to ponder what you’ve done in this exercise. We’ll revisit these ideas a little later.

Links