Exercise 8.4

Objectives:

  • Introduction to generator expressions—a generator version of a list comprehension.

Files Created: None

Files Modified: None

In Exercise 8.3 you wrote some code that followed lines being written to a log file and parsed them into a sequence of rows. For example:

>>> lines = follow('Data/stocklog.csv')
>>> rows = parse_stock_data(lines)
>>> for r in rows:
        print r

... watch the output (it might take 30 seconds for data to appear) ...

In this exercise, we’re going to further experiment with generators by writing some generator expressions that perform queries on the sequence of rows being generated by the parse_stock_data() function.

(a) Generator Expressions

Write a generator expression that only produces rows for stocks that have a negative change:

>>> lines = follow('Data/stocklog.csv')
>>> rows = parse_stock_data(lines)
>>> rows = (r for r in rows if r['change'] < 0)
>>> for r in rows:
        print '%(name)10s %(price)10.2f %(change)10.2f %(volume)10d' % r

... watch the output ...

In this example, the final rows variable is another generator—it does not create a fully populated list like a list comprehension.

(b) Generator Expressions in Function Arguments

Generator expressions are sometimes placed into function arguments. It looks a little weird at first, but try this experiment:

>>> nums = [1,2,3,4,5]
>>> sum([x*x for x in nums])    # A list comprehension
55
>>> sum(x*x for x in nums)      # A generator expression
55
>>>

In the above example, the second version using generators typically uses far less memory.

Try this tricky problem involving string joins:

>>> row = ('GOOG', 100, 490.1)
>>> print ','.join(row)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 1: expected string, int found
>>>

It doesn’t work because some of the tuple items are non-strings. Try this version that converts each item:

>>> print ','.join(str(r) for r in row)
GOOG,100,490.1
>>>

(c) Generator Expressions as Filters

Generator expressions are a convenient way to filter streams of data such as a file. For example, try this:

>>> f = open('Data/stocklog.csv')
>>> lines = (line for line in f if 'IBM' in line)
>>> for line in lines:
        print line,

... look at the output ...
Links