Exercise 4.5

Objectives:

  • Play with the collections module

Files Created: None

Files Modified: None

The collections module might be one of the most useful library modules for dealing with special purpose kinds of data handling problems such as tabulating and indexing. In this exercise, we’ll look at a few simple examples. Start by reading a portfolio of stocks using your report.py program:

>>> import report
>>> portfolio = report.read_portfolio('Data/portfolio.csv')
>>>

(a) Tabulating with Counters

Suppose you wanted to tabulate the total number of shares of each stock. This is easy using Counter objects. Try it:

>>> from collections import Counter
>>> holdings = Counter()
>>> for s in portfolio:
        holdings[s['name']] += s['shares']

>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>>

Carefully observe how the multiple entries for MSFT and IBM in portfolio get combined into a single entry here. You can use a Counter just like a dictionary to retrieve individual values:

>>> holdings['IBM']
150
>>> holdings['MSFT']
250
>>>

If you want to rank the values, do this:

>>> # Get three most held stocks
>>> holdings.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
>>>

Let’s grab another portfolio of stocks and make a new Counter:

>>> portfolio2 = report.read_portfolio('Data/portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
         holdings2[s['name']] += s['shares']

>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>

Finally, let’s combine the holdings doing one simple operation:

>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>> combined = holdings + holdings2
>>> combined
Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150})
>>>

This is only a small taste of what counters provide. However, if you ever find yourself needing to tabulate values, you should consider using one.

(b) Grouping and Indexing Data

Instead of merely summing up the total number of shares, suppose you wanted to group all of the portfolio entries by stock symbol. Here is an easy way to do it using defaultdict objects:

>>> from collections import defaultdict
>>> holdings_by_symbol = defaultdict(list)
>>> for s in portfolio:
        holdings_by_symbol[s['name']].append(s)

>>> holdings_by_symbol['IBM']
[{'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
>>> holdings_by_symbol['AA']
[{'price': 32.2, 'name': 'AA', 'shares': 100}]
>>> holdings_by_symbol['GOOG']
[]
>>>

The key feature of a defauldict is that it automatically creates the initial dictionary value for you. For example:

>>> d = defaultdict(list)
>>> d['x']
[]
>>> d['y']
[]
>>> d
defaultdict(<type 'list'>, {'y': [], 'x': []})
>>>

The fact that the initial item is created automatically makes it easier to combine insertion with other operations such as a list append. For example:

>>> d['x'].append(10)
>>> d['z'].append(42)
>>> d
defaultdict(<type 'list'>, {'y': [], 'x': [10], 'z': [42]})
>>>

Although default dictionaries might seem a bit odd at first, they can be one of the most useful objects in the collections module. Consider their use whenever you think you might want to make a dictionary that holds lists, set, or other dictionaries.

Links