Exercise 8.3
(a) Setting up a processing pipeline
A major power of generators is that they allow you to create programs that set up processing pipelines—much like pipes on Unix systems. Experiment with this concept by performing these steps:
>>> from follow import follow
>>> import csv
>>> lines = follow('Data/stocklog.csv')
>>> rows = csv.reader(lines)
>>> for row in rows:
print(row)
['BA', '98.35', '6/11/2007', '09:41.07', '0.16', '98.25', '98.35', '98.31', '158148']
['AA', '39.63', '6/11/2007', '09:41.07', '-0.03', '39.67', '39.63', '39.31', '270224']
['XOM', '82.45', '6/11/2007', '09:41.07', '-0.23', '82.68', '82.64', '82.41', '748062']
['PG', '62.95', '6/11/2007', '09:41.08', '-0.12', '62.80', '62.97', '62.61', '454327']
...
Well, that’s interesting. What you’re seeing here is that the output of the
follow()
function has been piped into the csv.reader()
function and we’re
now getting a sequence of split rows. You can take it a step further if
you use csv.DictReader()
like this:
>>> headers = ['name','price','date','time','change','open','high','low','volume']
>>> lines = follow('Data/stocklog.csv')
>>> rows = csv.DictReader(lines, headers)
>>> for r in rows:
print(r)
{'volume': '1022770', 'name': 'AXP', 'price': '62.82', 'high': '62.82', 'low': '62.38', 'time': '09:43.57', 'date': '6/11/2007', 'open': '62.79', 'change': '-0.22'}
{'volume': '179585', 'name': 'BA', 'price': '98.44', 'high': '98.44', 'low': '98.31', 'time': '09:43.57', 'date': '6/11/2007', 'open': '98.25', 'change': '0.25'}
{'volume': '314942', 'name': 'AA', 'price': '39.72', 'high': '39.72', 'low': '39.31', 'time': '09:43.57', 'date': '6/11/2007', 'open': '39.67', 'change': '0.06'}
...
This is kind of amazing if you think about it—you just stacked a single library call on top
of the follow()
function and now the code is producing a sequence of dictionaries.
(b) Making more pipeline components
Let’s extend the whole idea by writing a generator function to convert various fields in the dictionaries:
def convert(rows, func, keylist):
'''
Apply type conversion to the value of selected keys in a sequence of dictionaries.
'''
for r in rows:
for key in keylist:
r[key] = func(r[key])
yield r
Try this new function out as follows:
>>> lines = follow('Data/stocklog.csv')
>>> rows = csv.DictReader(lines, headers)
>>> rows = convert(rows, float, ['price','change','open','high','low'])
>>> rows = convert(rows, int, ['volume'])
>>> for r in rows:
if r['change'] < 0:
print(r)
... watch the output ...
Here, you’re starting to see a lot of different processing elements stacked together. Again, keep in mind that each generator is merely processing a stream of data.
(c) Packaging
Take the different steps you tried in the last example and put them into a function:
# follow.py
...
def parse_stock_data(lines):
'''
Take a sequence of lines and produce a sequence of dictionaries containing stock market data.
'''
headers = ['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume']
rows = csv.DictReader(lines, headers)
rows = convert(rows, float, ['price','change','open','high','low'])
rows = convert(rows, int, ['volume'])
return rows
Now, try this new function and print a nicely formatted stock ticker:
>>> lines = follow('Data/stocklog.csv')
>>> rows = parse_stock_data(lines)
>>> for r in rows:
if r['change'] < 0:
print '%(name)10s %(price)10.2f %(change)10.2f' % r
... watch the output ...