Exercise 2.5

Objectives:

  • Learn how to quickly process list data.

  • Apply an operation to all elements of a list

  • Perform database-style queries across items in a list

  • Introduction to "declarative programming" in Python

Files Created: None.

Files Modified: None.

A preliminary step, take your report.py program and run it. Now, at the Python interactive prompt, type statements to perform the operations described below. These operations perform various kinds of data reductions, transforms, and queries on the portfolio data.

(a) List comprehensions

Try a few simple list comprehensions just to become familiar with the syntax.

>>> nums = [1,2,3,4]
>>> squares = [x*x for x in nums]
>>> squares
[1, 4, 9, 16]
>>> twice = [2*x for x in nums if x > 2]
>>> twice
[6, 8]
>>>

Notice how the list comprehensions are creating a new list with the data suitably transformed or filtered.

(b) Sequence Reductions

Compute the total cost of the portfolio using a single Python statement.

>>> cost = sum([s['shares']*s['price'] for s in portfolio])
>>> cost
44671.15
>>>

After you have done that, show how you can compute the current value of the portfolio using a single statement.

>>> value = sum([s['shares']*prices[s['name']] for s in portfolio])
>>> value
28686.1
>>>
Discussion

Both of the above operations are an example of a map-reduction. The list comprehension is mapping an operation across the list. For example:

>>> [s['shares']*s['price'] for s in portfolio]
[3220.0000000000005, 4555.0, 12516.0, 10246.0, 3835.1499999999996, 3254.9999999999995, 7044.0]
>>>

The sum() function is then performing a reduction across the result:

>>> sum(_)
44671.15
>>>

With this knowledge, you are now ready to go form a big-data startup company.

(c) Data Queries

Try the following examples of various data queries. First, a list of all portfolio holdings with more than 100 shares.

>>> more100 = [s for s in portfolio if s['shares'] > 100]
>>> more100
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
>>>

All portfolio holdings for MSFT and IBM stocks.

>>> msftibm = [s for s in portfolio if s['name'] in ['MSFT','IBM']]
>>> msftibm
[{'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
>>>

A list of all portfolio holdings that cost more than $10000.

>>> cost10k = [s for s in portfolio if s['shares']*s['price'] > 10000]
>>> cost10k
[{'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}]
>>>

(d) Data Extraction

Show how you could build a list of tuples (name, shares) where name and shares are taken from portfolio.

>>> name_shares =[(s['name'],s['shares']) for s in portfolio]
>>> name_shares
[('AA', 100), ('IBM', 50), ('CAT', 150), ('MSFT', 200), ('GE', 95), ('MSFT', 50), ('IBM', 100)]
>>>

Show how you create a set of all unique stock symbols in portfolio.

>>> names = set([s['name'] for s in portfolio])
>>> names
set(['AA', 'GE', 'IBM', 'MSFT', 'CAT'])
>>>

This last step can more compactly be expressed with a feature known as a "set comprehension". Simply write a list comprehension, but change the square brackets ([,]) to curly braces ({, }).

>>> names = { s['name'] for s in portfolio }
>>> names
set(['AA', 'GE', 'IBM', 'MSFT', 'CAT'])
>>>

Build a dictionary that maps the name of a stock to the total number of shares held.

>>> holdings = dict.fromkeys(names, 0)
>>> holdings
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}
>>> for name, shares in name_shares:
        holdings[name] += shares

>>> holdings
{'AA': 100, 'GE': 95, 'IBM': 150, 'MSFT': 250, 'CAT': 150}
>>>

The dict.fromkeys() method creates a dictionary from a set of keys, initializing all of the values to a value you provide. This was done to set up initial counts for tabulating the total number of shares in the for loop that follows. This initialization could also be performed using a dictionary comprehension:

>>> holdings = { name:0 for name in names }
>>> holdings
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}
>>>

(e) Extracting Data From CSV Files (Advanced)

Knowing how to use various combinations of list, set, and dictionary comprehensions can be useful in various forms of data processing. Here’s an example that shows how to extract selected columns from a CSV file.

First, read a row of header information from a CSV file:

>>> import csv
>>> f = open('Data/portfoliodate.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> headers
['name', 'date', 'time', 'shares', 'price']
>>>

Next, define a variable that lists the columns that you actually care about:

>>> columns = ['name', 'shares', 'price']
>>>

Now, locate the indices of the above columns in the source CSV file:

>>> indices = [ (colname, headers.index(colname)) for colname in columns ]
>>> indices
[('name', 0), ('shares', 3), ('price', 4)]
>>>

Finally, read a row of data and turn it into a dictionary using a dictionary comprehension:

>>> row = next(f_csv)
>>> record = { colname: row[index] for colname, index in indices }     # dict-comprehension
>>> record
{'price': '32.20', 'name': 'AA', 'shares': '100'}
>>>

If you’re feeling comfortable with what just happened, read the rest of the file:

>>> portfolio = [ {colname: row[index] for colname, index in indices} for row in f_csv ]
>>> portfolio
[{'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>

Oh my, you just reduced much of the read_portfolio() function to a single statement.

Discussion

List comprehensions are commonly used in Python as an efficient means for transforming, filtering, or collecting data. Due to the syntax, you don’t want to go overboard—try to keep each list comprehension as simple as possible. It’s okay to break things into multiple steps. For example, it’s not clear that you would want to spring that last example on your unsuspecting co-workers.

That said, knowing how to quickly manipulate data is a skill that’s incredibly useful. There are numerous situations where you might have to solve some kind of one-off problem involving data imports, exports, extraction, and so forth. Becoming a guru master of list comprehensions can substantially reduce the time spent devising a solution.

Links