Exercise 4.5
The collections
module might be one of the most useful library modules for dealing with
special purpose kinds of data handling problems such as tabulating and indexing. In this
exercise, we’ll look at a few simple examples. Start by reading a portfolio of stocks using
your report.py
program:
>>> import report
>>> portfolio = report.read_portfolio('Data/portfolio.csv')
>>>
(a) Tabulating with Counters
Suppose you wanted to tabulate the total number of shares of each stock. This is easy using
Counter
objects. Try it:
>>> from collections import Counter
>>> holdings = Counter()
>>> for s in portfolio:
holdings[s['name']] += s['shares']
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>>
Carefully observe how the multiple entries for MSFT
and IBM
in portfolio
get
combined into a single entry here. You can use a Counter just like a dictionary to retrieve
individual values:
>>> holdings['IBM']
150
>>> holdings['MSFT']
250
>>>
If you want to rank the values, do this:
>>> # Get three most held stocks
>>> holdings.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
>>>
Let’s grab another portfolio of stocks and make a new Counter:
>>> portfolio2 = report.read_portfolio('Data/portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
holdings2[s['name']] += s['shares']
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>
Finally, let’s combine the holdings doing one simple operation:
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>> combined = holdings + holdings2
>>> combined
Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150})
>>>
This is only a small taste of what counters provide. However, if you ever find yourself needing to tabulate values, you should consider using one.
(b) Grouping and Indexing Data
Instead of merely summing up the total number of shares, suppose you wanted to group all of the portfolio entries
by stock symbol. Here is an easy way to do it using defaultdict
objects:
>>> from collections import defaultdict
>>> holdings_by_symbol = defaultdict(list)
>>> for s in portfolio:
holdings_by_symbol[s['name']].append(s)
>>> holdings_by_symbol['IBM']
[{'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]
>>> holdings_by_symbol['AA']
[{'price': 32.2, 'name': 'AA', 'shares': 100}]
>>> holdings_by_symbol['GOOG']
[]
>>>
The key feature of a defauldict
is that it automatically creates the initial dictionary value for you.
For example:
>>> d = defaultdict(list)
>>> d['x']
[]
>>> d['y']
[]
>>> d
defaultdict(<type 'list'>, {'y': [], 'x': []})
>>>
The fact that the initial item is created automatically makes it easier to combine insertion with other operations such as a list append. For example:
>>> d['x'].append(10)
>>> d['z'].append(42)
>>> d
defaultdict(<type 'list'>, {'y': [], 'x': [10], 'z': [42]})
>>>
Although default dictionaries might seem a bit odd at first, they can be one of the most useful objects
in the collections
module. Consider their use whenever you think you might want to make a dictionary
that holds lists, set, or other dictionaries.