Exericse Data.2

Objectives:

  • Try a few simple examples involving Pandas

Files Created: None

Files Modified: None

In the previous exercise, we turned the stock portfolio into a dictionary that held numpy arrays. Each array was used to the store the data for an individual column. If you ever find yourself needing to manipulate column-oriented data, you should probably be using Pandas instead. Fortunately, you already have the general idea. Let’s jump in.

(a) Reading a CSV file

Let’s just grab a CSV file:

>>> import pandas
>>> portfolio = pandas.read_csv('Data/portfolio.csv')
>>> portfolio
   name  shares  price
0    AA     100  32.20
1   IBM      50  91.10
2   CAT     150  83.44
3  MSFT     200  51.23
4    GE      95  40.37
5  MSFT      50  65.10
6   IBM     100  70.44
>>>

The result of this operation is a "DataFrame". What’s a DataFrame you ask? Well, it’s a lot like the dictionary of columns you created in the last exercise. View the individual columns:

>>> portfolio['name']
0      AA
1     IBM
2     CAT
3    MSFT
4      GE
5    MSFT
6     IBM
Name: name, dtype: object
>>> portfolio['shares']
0    100
1     50
2    150
3    200
4     95
5     50
6    100
Name: shares, dtype: int64
>>> portfolio['price']
0    32.20
1    91.10
2    83.44
3    51.23
4    40.37
5    65.10
6    70.44
Name: price, dtype: float64
>>>

Each column is like an array. Perform a calculation involving the columns:

>>> cost = portfolio['shares']*portfolio['price']
>>> cost
0     3220.00
1     4555.00
2    12516.00
3    10246.00
4     3835.15
5     3255.00
6     7044.00
dtype: float64
>>>

Add the new column to the dataframe so you can keep it for later:

>>> portfolio['cost'] = cost
>>> portfolio
   name  shares  price      cost
0    AA     100  32.20   3220.00
1   IBM      50  91.10   4555.00
2   CAT     150  83.44  12516.00
3  MSFT     200  51.23  10246.00
4    GE      95  40.37   3835.15
5  MSFT      50  65.10   3255.00
6   IBM     100  70.44   7044.00
>>>

Perform some reductions:

>>> portfolio['cost'].sum()
44671.150000000001
>>> portfolio['price'].min()
32.200000000000003
>>>

(b) Perform some queries on the data

This is going to mirror what we did in the last exercise:

>>> # Compare all of the shares
>>> less100 = portfolio['shares'] < 100
>>> less100
0    False
1     True
2    False
3    False
4     True
5     True
6    False
Name: shares, dtype: bool
>>>

>>> # Find all of the matching portfolio entries
>>> portfolio[less100]
   name  shares  price     cost
1   IBM      50  91.10  4555.00
4    GE      95  40.37  3835.15
5  MSFT      50  65.10  3255.00
>>>

This last operation produced a new dataframe. You can use it to continue performing more operations.

(c) Grouping and aggregation

Group all of the data by stock name and tabulate the total number of shares:

>>> totals = portfolio.groupby('name')['shares'].sum()
>>> totals
name
AA      100
CAT     150
GE       95
IBM     150
MSFT    250
Name: shares, dtype: int64
>>> totals['AA']
100
>>> totals['IBM']
150
>>>

This last example might take a bit of time to digest. However, it’s a good illustration of what’s possible in very little code. Again, we haven’t even really scratched the surface.

Links

[ Back | Next | Index ]