Exercise 2.6
(a) First-class Data
In the file Data/portfolio.csv
, we
read data organized as columns that look like this:
name,shares,price
"AA",100,32.20
"IBM",50,91.10
...
In previous code, we used the csv
module to read the file, but still had to perform manual
type conversions. For example:
for row in f_csv:
name = row[0]
shares = int(row[1])
price = float(row[2])
This kind of conversion can also be performed in a more clever manner using some list basic operations. Make a Python list that contains the names of the conversion functions you would use to convert each column into the appropriate type:
>>> coltypes = [str, int, float]
>>>
The reason you can even create this list is that everything in Python
is "first-class." So, if you want to have a list of functions, that’s
fine. The items in the list you created are functions for converting
a value x
into a given type (e.g., str(x)
, int(x)
, float(x)
).
Now, read a row of data from the above file:
>>> import csv
>>> f = open('Data/portfolio.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> row = next(f_csv)
>>> row
['AA', '100', '32.20']
>>>
As noted, this row isn’t enough to do calculations because the types are wrong. For example:
>>> row[1] * row[2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'str'
>>>
However,
maybe the data can be paired up with the types you specified in coltypes
.
For example:
>>> coltypes[1]
<type 'int'>
>>> row[1]
'100'
>>>
Try converting one of the values:
>>> coltypes[1](row[1]) # Same as int(row[1])
100
>>>
Try converting a different value:
>>> coltypes[2](row[2]) # Same as float(row[2])
32.2
>>>
Zip the column types with the fields and look at the result:
>>> r = zip(coltypes,row)
>>> r
[(<type 'str'>, 'AA'), (<type 'int'>, '100'), (<type 'float'>,'32.20')]
>>>
You will notice that this has paired a type conversion with a value. For example, int
is
paired with the value '100'
. The zipped list is useful if you want to perform
conversions on all of the values, one after the other. Try this:
>>> converted = []
>>> for func, val in zip(coltypes, row):
converted.append(func(val))
...
>>> converted
['AA', 100, 32.2]
>>>
Make sure you understand what’s happening in the above code. In the loop, the func
variable is
one of the type conversion functions (e.g., str
, int
, etc.) and the val
variable
is one of the values like 'AA'
, '100'
. The expression func(val)
is simply converting
a value (kind of like a type cast).
The above code can be compressed into a single list comprehension. Try this:
>>> converted = [func(val) for func,val in zip(coltypes,row)]
>>> converted
['AA', 100, 32.2]
>>>
(b) Making dictionaries
Remember how the dict()
function can easily make a dictionary if you have a sequence of
key names and values? Let’s make a dictionary from the column headers:
>>> headers
['name', 'shares', 'price']
>>> converted
['AA', 100, 32.2]
>>> dict(zip(headers, converted))
{'price': 32.2, 'name': 'AA', 'shares': 100}
>>>
Of course, if you’re up on your list-comprehension fu, you can do the whole conversion in a single shot using a dict-comprehension:
>>> { name: func(val) for name, func, val in zip(headers, coltypes, row) }
{'price': 32.2, 'name': 'AA', 'shares': 100}
>>>
(c) The Big Picture
Using the techniques in this exercise, you could write statements that easily convert fields from just about any column-oriented datafile into a Python dictionary. Just to illustrate, suppose you read data from a different datafile like this:
>>> f = open('Data/dowstocks.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> row = next(f_csv)
>>> headers
['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume']
>>> row
['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '39.67', '39.69', '39.45', '181800']
>>>
Let’s convert the fields using a similar trick:
>>> coltypes = [str,float,str,str,float,float,float,float,int]
>>> converted = [func(val) for func,val in zip(coltypes, row)]
>>> record = dict(zip(headers, converted))
>>> record
{'volume': 181800, 'name': 'AA', 'price': 39.48, 'high': 39.69,
'low': 39.45, 'time': '9:36am', 'date': '6/11/2007', 'open': 39.67,
'change': -0.18}
>>> record['name']
'AA'
>>> record['price']
39.48
>>>
Spend some time to ponder what you’ve done in this exercise. We’ll revisit these ideas a little later.