Exericse Data.2
In the previous exercise, we turned the stock portfolio into a dictionary that held numpy arrays. Each array was used to the store the data for an individual column. If you ever find yourself needing to manipulate column-oriented data, you should probably be using Pandas instead. Fortunately, you already have the general idea. Let’s jump in.
(a) Reading a CSV file
Let’s just grab a CSV file:
>>> import pandas
>>> portfolio = pandas.read_csv('Data/portfolio.csv')
>>> portfolio
name shares price
0 AA 100 32.20
1 IBM 50 91.10
2 CAT 150 83.44
3 MSFT 200 51.23
4 GE 95 40.37
5 MSFT 50 65.10
6 IBM 100 70.44
>>>
The result of this operation is a "DataFrame". What’s a DataFrame you ask? Well, it’s a lot like the dictionary of columns you created in the last exercise. View the individual columns:
>>> portfolio['name']
0 AA
1 IBM
2 CAT
3 MSFT
4 GE
5 MSFT
6 IBM
Name: name, dtype: object
>>> portfolio['shares']
0 100
1 50
2 150
3 200
4 95
5 50
6 100
Name: shares, dtype: int64
>>> portfolio['price']
0 32.20
1 91.10
2 83.44
3 51.23
4 40.37
5 65.10
6 70.44
Name: price, dtype: float64
>>>
Each column is like an array. Perform a calculation involving the columns:
>>> cost = portfolio['shares']*portfolio['price']
>>> cost
0 3220.00
1 4555.00
2 12516.00
3 10246.00
4 3835.15
5 3255.00
6 7044.00
dtype: float64
>>>
Add the new column to the dataframe so you can keep it for later:
>>> portfolio['cost'] = cost
>>> portfolio
name shares price cost
0 AA 100 32.20 3220.00
1 IBM 50 91.10 4555.00
2 CAT 150 83.44 12516.00
3 MSFT 200 51.23 10246.00
4 GE 95 40.37 3835.15
5 MSFT 50 65.10 3255.00
6 IBM 100 70.44 7044.00
>>>
Perform some reductions:
>>> portfolio['cost'].sum()
44671.150000000001
>>> portfolio['price'].min()
32.200000000000003
>>>
(b) Perform some queries on the data
This is going to mirror what we did in the last exercise:
>>> # Compare all of the shares
>>> less100 = portfolio['shares'] < 100
>>> less100
0 False
1 True
2 False
3 False
4 True
5 True
6 False
Name: shares, dtype: bool
>>>
>>> # Find all of the matching portfolio entries
>>> portfolio[less100]
name shares price cost
1 IBM 50 91.10 4555.00
4 GE 95 40.37 3835.15
5 MSFT 50 65.10 3255.00
>>>
This last operation produced a new dataframe. You can use it to continue performing more operations.
(c) Grouping and aggregation
Group all of the data by stock name and tabulate the total number of shares:
>>> totals = portfolio.groupby('name')['shares'].sum()
>>> totals
name
AA 100
CAT 150
GE 95
IBM 150
MSFT 250
Name: shares, dtype: int64
>>> totals['AA']
100
>>> totals['IBM']
150
>>>
This last example might take a bit of time to digest. However, it’s a good illustration of what’s possible in very little code. Again, we haven’t even really scratched the surface.