Exercise 1.6

Objectives:

  • How to open and read data from files.

  • Using the for-loop to iterate over all of the lines in file.

  • Converting text strings into numbers.

  • Performing simple calculations with column-oriented data.

  • How to open different kinds of files

Files Created: pcost.py

Caution

For this exercise involving files, you may encounter problems if Python is running in a different working directory than the practical-python/ folder.

If you are using IDLE, make sure that it is running in the SAME DIRECTORY as the directory where you are editing your solutions (e.g., "C:\practical-python"). If you started IDLE by clicking on the RunIDLE.pyw file, then it should already be set up.

(a) File Preliminaries

The file Data/portfolio.csv contains a list of lines with information on a portfolio of stocks. Try a few experiments that show how to read data from the file.

First, try reading the entire file all at once as a big string:

>>> f = open('Data/portfolio.csv', 'r')
>>> data = f.read()
>>> data
'name,shares,price\n"AA",100,32.20\n"IBM",50,91.10\n"CAT",150,83.44\n"MSFT",200,51.23\n"GE",95,40.37\n"MSFT",50,65.10\n"IBM",100,70.44\n'
>>> print data
name,shares,price
"AA",100,32.20
"IBM",50,91.10
"CAT",150,83.44
"MSFT",200,51.23
"GE",95,40.37
"MSFT",50,65.10
"IBM",100,70.44

>>>

In the above example, it should be noted that Python has two modes of output. In the first mode where you simply type data at the prompt, Python shows you the raw string representation including quotes and escape codes. When you type print data, you get the actual formatted output of the string.

Although reading a file all at once is simple, it is often not the most appropriate way to do it—especially if the file happens to be huge or if contains lines of text that you want to handle one at a time. To read a file line-by-line, use a for-loop like this:

>>> f = open('Data/portfolio.csv', 'r')
>>> for line in f:
       print line,        # Note: trailing , omits the newline added by print

name,shares,price
"AA",100,32.20
"IBM",50,91.10
...

>>>

When you use this code as shown, lines are read until the end of the file is reached at which point the loop stops.

On certain occasions, you might want to manually read or skip a single line of text (e.g., perhaps you want to skip the first line of column headers). To do that, use next() as shown here:

>>> f = open('Data/portfolio.csv', 'r')
>>> headers = next(f)
>>> headers
'name,shares,price\n'
>>> for line in f:
        print line,

"AA",100,32.20
"IBM",50,91.10
...
>>>

next() simply returns the next line of text in the file. If you were to call it repeatedly, you would get successive lines. However, just so you know, the for- loop already uses next() to obtain its data. Thus, you normally wouldn’t call it directly unless you’re trying to explicitly skip or read a single line as shown.

Once you’re reading lines of a file, you can start to perform more processing such as splitting. For example, try this:

>>> f = open('Data/portfolio.csv', 'r')
>>> headers = next(f).split(',')
>>> headers
['name', 'shares', 'price\n']
>>> for line in f:
        row = line.split(',')
        print row

['"AA"', '100', '32.20\n']
['"IBM"', '50', '91.10\n']
...
>>>

(b) Reading a data file

Now that you know how to read a file, let’s write a program to perform a simple calculation. The columns in Data/portfolio.csv correspond to the stock name, number of shares, and purchase price of a single share.

Write a program called pcost.py that opens this file, reads all lines, and calculates how much it cost to purchase all of the shares in the portfolio. Hint: to convert a string to an integer, use int(s). To convert a string to a floating point, use float(s).

Your program should print output such as the following:

Total cost 44671.15

(c) Other kinds of "files"

What happens if you read a non-text file such as gzip-compressed datafile?

>>> f = open('Data/portfolio.csv.gz','r')
>>> data = f.read()
>>> data
'\x1f\x8b\x08\x08\xa9\xc1\xd6R\x00... bunch of junk'
>>>

The funny codes such as '\x08' and '\xc1' represent the hex value of non-printable byte values in the string. On this subject, the ord() function can convert a character into its integer character code value. The chr() function converts an integer value back into a character:

>>> ord('A')
65
>>> ord('\xc1')
193
>>> chr(65)
'A'
>>> chr(8)
'\x08'
>>>

Python has a library module gzip that can read gzip compressed files. For example:

>>> import gzip
>>> f = gzip.open('Data/portfolio.csv.gz')
>>> for line in f:
           print line,

... look at the output ...
>>>

(d) Go scrape the web for some data

Note

If your machine requires the use of an HTTP proxy server, you may need to set the HTTP_PROXY environment variable to make this part work. For example:

>>> import os
>>> os.environ['HTTP_PROXY'] = 'http://yourproxy.server.com'
>>>

Python has a library module urllib that contains a function urlopen(). This function will open up a web page just like it was a file. Try it out by getting a list of prices for a few different stocks from Yahoo (note: the "sl1" at the end of the URL is the letter "S", the letter "L", and the number "1").

>>> import urllib
>>> u = urllib.urlopen('http://finance.yahoo.com/d/quotes.csv?s=AA,CAT,MSFT,GE,IBM&f=sl1')
>>> data = u.read()
>>> data
... look at the output ...
>>>

If you wanted to save the data in a file for later use, do this:

>>> f = open('prices.csv', 'wb')     # Write binary (needed for Windows)
>>> f.write(data)
>>> f.close()

If this works, you’ll have a file prices.csv that contains a collection of prices from today’s market.

Here’s a quick and dirty way to view the contents of the file you just wrote:

>>> print open('prices.csv').read()
... look at output ...
>>>
Links