Exercise 2.4
(a) Counting
Try some basic counting examples:
>>> for n in xrange(10): # Count 0 ... 9
print n,
0 1 2 3 4 5 6 7 8 9
>>> for n in xrange(10,0,-1): # Count 10 ... 1
print n,
10 9 8 7 6 5 4 3 2 1
>>> for n in xrange(0,10,2): # Count 0, 2, ... 8
print n,
0 2 4 6 8
>>>
(b) More sequence operations
Interactively experiment with some of the sequence reduction operations:
>>> data = [4, 9, 1, 25, 16, 100, 49]
>>> min(data)
1
>>> max(data)
100
>>> sum(data)
204
>>>
Try looping over the data:
>>> for x in data:
print x
4
9
...
>>> for n,x in enumerate(data):
print n,x
0 4
1 9
2 1
...
>>>
(c) Another enumerate() example
The file Data/missing.csv
contains data for a stock
portfolio, but has some rows with missing data. Try the following
code sample that loops over all of the lines of the file, but prints a
warning message for all bad rows along with the associated row number.
>>> import csv
>>> f = open('Data/missing.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> for rowno, row in enumerate(f_csv, start=1):
try:
name = row[0]
shares = int(row[1])
price = float(row[2])
except ValueError:
print "Row %d: Couldn't convert: %s" % (rowno, row)
Row 4: Couldn't convert: ['MSFT', '', '51.23']
Row 7: Couldn't convert: ['IBM', '', '70.44']
>>>
In this example, the 1
argument to enumerate()
sets the starting
value for the count. In this case, we’re starting the count with row
number 1. If you don’t specify a starting value, enumerate()
starts
counting from 0.
(d) Using the zip() function
In the file Data/portfolio.csv
, the first line contains column headers. In all
previous code, we’ve simply been discarding them. For example:
>>> f = open('Data/portfolio.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> headers
['name', 'shares', 'price']
>>>
However, what if you could use the headers for something useful? This is where the
zip()
function enters the picture. First try this to pair the file headers with a row of data:
>>> row = next(f_csv)
>>> row = ['AA', '100', '32.20']
>>> zip(headers, row)
[ ('name', 'AA'), ('shares', '100'), ('price', '32.20') ]
>>>
Notice how zip()
paired the column headers with the column values. This pairing is just
an intermediate step to building a dictionary. Now try this:
>>> record = dict(zip(headers, row))
>>> record
{'price': '32.20', 'name': 'AA', 'shares': '100'}
>>>
This transformation is one of the most useful tricks to know about when
processing a lot of data files. For example,
suppose you wanted to make your report program work with various input files,
but without regard for the actual column number where the name, shares, and price
appear. Modify the read_portfolio()
function in report.py
so that it looks
like this:
# report.py
import csv
def read_portfolio(filename):
'''
Read a stock portfolio file into a list of dictionaries with keys
name, shares, and price.
'''
portfolio = []
f = open(filename)
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
record = dict(zip(headers, row)) # Turn the row into a dict
stock = { # Pick out fields of interest
'name': record['name'],
'shares' : int(record['shares']),
'price' : float(record['price'])
}
portfolio.append(stock)
f.close()
return portfolio
Now, try your function on a completely different data file Data/portfoliodate.csv
which looks like this:
name,date,time,shares,price
"AA","6/11/2007","9:50am",100,32.20
"IBM","5/13/2007","4:20pm",50,91.10
"CAT","9/23/2006","1:30pm",150,83.44
"MSFT","5/17/2007","10:30am",200,51.23
"GE","2/1/2006","10:45am",95,40.37
"MSFT","10/31/2006","12:05pm",50,65.10
"IBM","7/9/2006","3:15pm",100,70.44
>>> portfolio = read_portfolio('Data/portfoliodate.csv')
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
>>>
Modify your report.py
program so that it reads data from
Data/portfoliodate.csv
instead of Data/portfolio.csv
. Amazingly,
you’ll find that your program still works even though the data file has a completely
different column format than before. That’s cool!
(e) Inverting a dictionary
A dictionary maps keys to values. For example, a dictionary of stock prices:
>>> prices = {
'GOOG' : 490.1,
'AA' : 23.45,
'IBM' : 91.1,
'MSFT' : 34.23
}
>>>
If you use the items()
method, you can get a list of (key,value)
pairs:
>>> prices.items()
[('GOOG', 490.1), ('AA', 23.45), ('IBM', 91.1), ('MSFT', 34.23)]
>>>
However, what if you wanted to get a list of (value, key)
pairs instead? Easy: use zip()
.
>>> pricelist = zip(prices.values(),prices.keys())
>>> pricelist
[(490.1, 'GOOG'), (23.45, 'AA'), (91.1, 'IBM'), (34.23, 'MSFT')]
>>>
Why would you do this? For one, it allows you to perform certain kinds of data processing on the dictionary data. For example:
>>> min(pricelist)
(23.45, 'AA')
>>> max(pricelist)
(490.1, 'GOOG')
>>> sorted(pricelist)
[(23.45, 'AA'), (34.23, 'MSFT'), (91.1, 'IBM'), (490.1, 'GOOG')]
>>>
This also illustrates an important feature of tuples. When used in comparisons, tuples are compared element-by-element starting with the first item (similar to how strings are compared character-by-character).