Exercise 1.6
(a) File Preliminaries
The file Data/portfolio.csv
contains a list of lines with information
on a portfolio of stocks. Try a few experiments that show how to read
data from the file.
First, try reading the entire file all at once as a big string:
>>> f = open('Data/portfolio.csv', 'r')
>>> data = f.read()
>>> data
'name,shares,price\n"AA",100,32.20\n"IBM",50,91.10\n"CAT",150,83.44\n"MSFT",200,51.23\n"GE",95,40.37\n"MSFT",50,65.10\n"IBM",100,70.44\n'
>>> print data
name,shares,price
"AA",100,32.20
"IBM",50,91.10
"CAT",150,83.44
"MSFT",200,51.23
"GE",95,40.37
"MSFT",50,65.10
"IBM",100,70.44
>>>
In the above example, it should be noted that Python has two modes of
output. In the first mode where you simply type data
at the
prompt, Python shows you the raw string representation including
quotes and escape codes. When you type print data
, you get
the actual formatted output of the string.
Although reading a file all at once is simple, it is often not the most appropriate way to do it—especially if the file happens to be huge or if contains lines of text that you want to handle one at a time. To read a file line-by-line, use a for-loop like this:
>>> f = open('Data/portfolio.csv', 'r')
>>> for line in f:
print line, # Note: trailing , omits the newline added by print
name,shares,price
"AA",100,32.20
"IBM",50,91.10
...
>>>
When you use this code as shown, lines are read until the end of the file is reached at which point the loop stops.
On certain occasions, you might want to manually read or skip a single
line of text (e.g., perhaps you want to skip the first line of column headers).
To do that, use next()
as shown here:
>>> f = open('Data/portfolio.csv', 'r')
>>> headers = next(f)
>>> headers
'name,shares,price\n'
>>> for line in f:
print line,
"AA",100,32.20
"IBM",50,91.10
...
>>>
next()
simply returns the next line of text in the
file. If you were to call it repeatedly, you would get successive
lines. However, just so you know, the for-
loop already uses next()
to
obtain its data. Thus, you normally wouldn’t call it directly unless
you’re trying to explicitly skip or read a single line as shown.
Once you’re reading lines of a file, you can start to perform more processing such as splitting. For example, try this:
>>> f = open('Data/portfolio.csv', 'r')
>>> headers = next(f).split(',')
>>> headers
['name', 'shares', 'price\n']
>>> for line in f:
row = line.split(',')
print row
['"AA"', '100', '32.20\n']
['"IBM"', '50', '91.10\n']
...
>>>
(b) Reading a data file
Now that you know how to read a file, let’s write a program to perform
a simple calculation. The columns in Data/portfolio.csv
correspond to the stock name, number of shares, and purchase price of
a single share.
Write a program called pcost.py
that opens this file, reads all
lines, and calculates how much it cost to purchase all of the shares
in the portfolio. Hint: to convert a string to an integer, use
int(s)
. To convert a string to a floating point, use float(s)
.
Your program should print output such as the following:
Total cost 44671.15
(c) Other kinds of "files"
What happens if you read a non-text file such as gzip-compressed datafile?
>>> f = open('Data/portfolio.csv.gz','r')
>>> data = f.read()
>>> data
'\x1f\x8b\x08\x08\xa9\xc1\xd6R\x00... bunch of junk'
>>>
The funny codes such as '\x08'
and '\xc1'
represent the hex value
of non-printable byte values in the string. On this subject, the
ord()
function can convert a character into its integer character
code value. The chr()
function converts an integer value back into
a character:
>>> ord('A')
65
>>> ord('\xc1')
193
>>> chr(65)
'A'
>>> chr(8)
'\x08'
>>>
Python has a library module gzip
that can read gzip compressed files. For example:
>>> import gzip
>>> f = gzip.open('Data/portfolio.csv.gz')
>>> for line in f:
print line,
... look at the output ...
>>>
(d) Go scrape the web for some data
Python has a library module urllib
that contains a
function urlopen()
. This function will open up a web page
just like it was a file. Try it out by getting a list of prices for
a few different stocks from Yahoo (note:
the "sl1" at the end of the URL is the letter "S", the letter "L", and the
number "1").
>>> import urllib
>>> u = urllib.urlopen('http://finance.yahoo.com/d/quotes.csv?s=AA,CAT,MSFT,GE,IBM&f=sl1')
>>> data = u.read()
>>> data
... look at the output ...
>>>
If you wanted to save the data in a file for later use, do this:
>>> f = open('prices.csv', 'wb') # Write binary (needed for Windows)
>>> f.write(data)
>>> f.close()
If this works, you’ll have a file prices.csv
that contains a collection
of prices from today’s market.
Here’s a quick and dirty way to view the contents of the file you just wrote:
>>> print open('prices.csv').read()
... look at output ...
>>>