Exercise 8.3

Objectives:

  • Using generators to set up processing pipelines

Files Created: None

Files Modified: follow.py

Notes

For this exercise the stocksim.py program should still be running in the background. You’re going to use the follow() function you wrote in the previous exercise.

(a) Setting up a processing pipeline

A major power of generators is that they allow you to create programs that set up processing pipelines—much like pipes on Unix systems. Experiment with this concept by performing these steps:

>>> from follow import follow
>>> import csv
>>> lines = follow('Data/stocklog.csv')
>>> rows = csv.reader(lines)
>>> for row in rows:
        print(row)

['BA', '98.35', '6/11/2007', '09:41.07', '0.16', '98.25', '98.35', '98.31', '158148']
['AA', '39.63', '6/11/2007', '09:41.07', '-0.03', '39.67', '39.63', '39.31', '270224']
['XOM', '82.45', '6/11/2007', '09:41.07', '-0.23', '82.68', '82.64', '82.41', '748062']
['PG', '62.95', '6/11/2007', '09:41.08', '-0.12', '62.80', '62.97', '62.61', '454327']
...

Well, that’s interesting. What you’re seeing here is that the output of the follow() function has been piped into the csv.reader() function and we’re now getting a sequence of split rows. You can take it a step further if you use csv.DictReader() like this:

>>> headers = ['name','price','date','time','change','open','high','low','volume']
>>> lines = follow('Data/stocklog.csv')
>>> rows = csv.DictReader(lines, headers)
>>> for r in rows:
        print(r)

{'volume': '1022770', 'name': 'AXP', 'price': '62.82', 'high': '62.82', 'low': '62.38', 'time': '09:43.57', 'date': '6/11/2007', 'open': '62.79', 'change': '-0.22'}
{'volume': '179585', 'name': 'BA', 'price': '98.44', 'high': '98.44', 'low': '98.31', 'time': '09:43.57', 'date': '6/11/2007', 'open': '98.25', 'change': '0.25'}
{'volume': '314942', 'name': 'AA', 'price': '39.72', 'high': '39.72', 'low': '39.31', 'time': '09:43.57', 'date': '6/11/2007', 'open': '39.67', 'change': '0.06'}
...

This is kind of amazing if you think about it—you just stacked a single library call on top of the follow() function and now the code is producing a sequence of dictionaries.

(b) Making more pipeline components

Let’s extend the whole idea by writing a generator function to convert various fields in the dictionaries:

def convert(rows, func, keylist):
    '''
    Apply type conversion to the value of selected keys in a sequence of dictionaries.
    '''
    for r in rows:
        for key in keylist:
            r[key] = func(r[key])
        yield r

Try this new function out as follows:

>>> lines = follow('Data/stocklog.csv')
>>> rows = csv.DictReader(lines, headers)
>>> rows = convert(rows, float, ['price','change','open','high','low'])
>>> rows = convert(rows, int, ['volume'])
>>> for r in rows:
        if r['change'] < 0:
            print(r)

... watch the output ...

Here, you’re starting to see a lot of different processing elements stacked together. Again, keep in mind that each generator is merely processing a stream of data.

(c) Packaging

Take the different steps you tried in the last example and put them into a function:

# follow.py
...
def parse_stock_data(lines):
    '''
    Take a sequence of lines and produce a sequence of dictionaries containing stock market data.
    '''
    headers = ['name', 'price', 'date', 'time', 'change', 'open', 'high', 'low', 'volume']
    rows = csv.DictReader(lines, headers)
    rows = convert(rows, float, ['price','change','open','high','low'])
    rows = convert(rows, int, ['volume'])
    return rows

Now, try this new function and print a nicely formatted stock ticker:

>>> lines = follow('Data/stocklog.csv')
>>> rows = parse_stock_data(lines)
>>> for r in rows:
        if r['change'] < 0:
            print '%(name)10s %(price)10.2f %(change)10.2f' % r

... watch the output ...
Discussion

Some lessons learned: You can create various generator functions and chain them together to perform processing involving data-flow pipelines. In addition, you can create functions that package a series of pipeline stages into a single function call (for example, the parse_stock_data() function).

A good mental model for generator functions might be Lego blocks. You can make a collection of small iterator patterns and start stacking them together in various ways. It can be an extremely powerful way to program.

Links