Objectives:

  • Apply topics in section 2 to problems in data aggregation

Files Modified: ctarides.py

In Practicum 1, you wrote a program ctarides.py that tabulated the total number of rides on the CTA for a given year. In this practicum, we build upon that program by building some data structures and doing a few more interesting calculations.

(a) Reading the data into a data structure

Start by writing a function read_rides(filename) that reads the ride data into a list of dictionaries where each dictionary looks something like this:

{
   'station_id' : 40010,                    # Integer
   'station_name' : 'Austin-Forest Park',   # String
   'date' : '01/01/2001',                   # String
   'daytype' : 'U',                         # String
   'rides' : 290                            # Integer
}

Your function should work like this:

>>> rides = read_rides('Data/ctarail.csv')
>>> len(rides)
621974
>>> # Examine the first 4 entries
>>> rides[0:4]
[{'date': '01/01/2001', 'daytype': 'U', 'station_name': 'Austin-Forest Park', 'rides': 290, 'station_id': 40010}, {'date': '01/01/2001', 'daytype': 'U', 'station_name': 'Harlem-Lake', 'rides': 633, 'station_id': 40020}, {'date': '01/01/2001', 'daytype': 'U', 'station_name': 'Pulaski-Lake', 'rides': 483, 'station_id': 40030}, {'date': '01/01/2001', 'daytype': 'U', 'station_name': 'Quincy/Wells', 'rides': 374, 'station_id': 40040}]
>>>

Once you have the complete data set loaded, you can start to do more interesting things with it.

(b) Finding the busiest L stations in Chicago

Try these steps to get started:

>>> # Collect all of the unique station_ids into a set
>>> stations = { r['station_id'] for r in rides }
>>>

>>> # Make a dictionary for tabulating counts
>>> station_counts = dict.fromkeys(stations, 0)
>>>

>>> # Count rides
>>> for ride in rides:
        station_counts[ride['station_id']] += ride['rides']

>>> # Show number of rides at different stations
>>> station_counts[40010]
6401782
>>> station_counts[40020]
13093093
>>>

Using the data in station_counts, write a program to output a table showing the total rides at each station ranked from most to least busy. For example:

Id   |Name                     |Count
-----|-------------------------|--------
40380|Clark/Lake               |56772423
41450|Chicago/State            |54746133
41660|Lake/State               |51524924
...
41510|Morgan-Lake              |288944
41680|Oakton-Skokie            |173746

To do this, you’ll need to figure out how to sort the data in station_counts. Also, you’ll need a way to associate the station name with the station id. To make the output look nice, you’ll also need to fiddle with string formatting a bit.

(c) Yearly totals

Write a function yearly_totals(rides, year) that computes the station totals by year and returns the result in a dictionary:

>>> y2012 = yearly_totals(rides, '2012')
{40960: 1573557, 40450: 3982594, 41220: 4419350, 41430: 1589531, 40710: 2032116, 41480: 1316509, 40970: 457900, 40460: 1916772, ... }

>>> # Find 2012 entries at station 40010
>>> y2012[40010]
643398

>>> # Find change in ridership 2001-2012
>>> y2001 = yearly_totals(rides, '2001')
>>> y2012[40010] - y2001[40010]
177137
>>>

Modify your program so that it outputs an extra column showing the change in ridership from 2001 to 2012. For example:

Id   |Name                     |Count   |Change
-----|-------------------------|--------|--------
40380|Clark/Lake               |56772423| 1378717
41450|Chicago/State            |54746133| 1097009
41660|Lake/State               |51524924| 1680578
...
41510|Morgan-Lake              |  288944|  288944
41680|Oakton-Skokie            |  173746|  173746

(d) Re-ordering the data

Modify your program so that it ranks the data according to the total change in ridership. For example:

Id   |Name                     |Count   |Change
-----|-------------------------|--------|--------
41400|Roosevelt                |33249846| 1817656
41660|Lake/State               |51524924| 1680578
40380|Clark/Lake               |56772423| 1378717
41220|Fullerton                |43086899| 1284827
...
41170|Garfield-Dan Ryan        |15741450| -169546
40990|69th                     |23355328| -310984
40450|95th/Dan Ryan            |50304330| -355227
40500|Washington/State         |14321124|-2179181

To do this last part, you’ll need to collect and sort the data according to the change value. You might need to investigate the key argument to the sort() method.

Links

[ Back | Next | Index ]