Practicum 2: Transit Trends
In Practicum 1, you wrote a program ctarides.py
that tabulated the total number of rides on the CTA for a given year.
In this practicum, we build upon that program by building some data structures
and doing a few more interesting calculations.
(a) Reading the data into a data structure
Start by writing a function read_rides(filename)
that reads the ride data
into a list of dictionaries where each dictionary looks something like this:
{
'station_id' : 40010, # Integer
'station_name' : 'Austin-Forest Park', # String
'date' : '01/01/2001', # String
'daytype' : 'U', # String
'rides' : 290 # Integer
}
Your function should work like this:
>>> rides = read_rides('Data/ctarail.csv')
>>> len(rides)
621974
>>> # Examine the first 4 entries
>>> rides[0:4]
[{'date': '01/01/2001', 'daytype': 'U', 'station_name': 'Austin-Forest Park', 'rides': 290, 'station_id': 40010}, {'date': '01/01/2001', 'daytype': 'U', 'station_name': 'Harlem-Lake', 'rides': 633, 'station_id': 40020}, {'date': '01/01/2001', 'daytype': 'U', 'station_name': 'Pulaski-Lake', 'rides': 483, 'station_id': 40030}, {'date': '01/01/2001', 'daytype': 'U', 'station_name': 'Quincy/Wells', 'rides': 374, 'station_id': 40040}]
>>>
Once you have the complete data set loaded, you can start to do more interesting things with it.
(b) Finding the busiest L stations in Chicago
Try these steps to get started:
>>> # Collect all of the unique station_ids into a set
>>> stations = { r['station_id'] for r in rides }
>>>
>>> # Make a dictionary for tabulating counts
>>> station_counts = dict.fromkeys(stations, 0)
>>>
>>> # Count rides
>>> for ride in rides:
station_counts[ride['station_id']] += ride['rides']
>>> # Show number of rides at different stations
>>> station_counts[40010]
6401782
>>> station_counts[40020]
13093093
>>>
Using the data in station_counts
, write a program to output a table
showing the total rides at each station ranked from most to least
busy. For example:
Id |Name |Count
-----|-------------------------|--------
40380|Clark/Lake |56772423
41450|Chicago/State |54746133
41660|Lake/State |51524924
...
41510|Morgan-Lake |288944
41680|Oakton-Skokie |173746
To do this, you’ll need to figure out how to sort the data in station_counts
. Also,
you’ll need a way to associate the station name with the station id. To make the output
look nice, you’ll also need to fiddle with string formatting a bit.
(c) Yearly totals
Write a function yearly_totals(rides, year)
that computes the station totals
by year and returns the result in a dictionary:
>>> y2012 = yearly_totals(rides, '2012')
{40960: 1573557, 40450: 3982594, 41220: 4419350, 41430: 1589531, 40710: 2032116, 41480: 1316509, 40970: 457900, 40460: 1916772, ... }
>>> # Find 2012 entries at station 40010
>>> y2012[40010]
643398
>>> # Find change in ridership 2001-2012
>>> y2001 = yearly_totals(rides, '2001')
>>> y2012[40010] - y2001[40010]
177137
>>>
Modify your program so that it outputs an extra column showing the change in ridership from 2001 to 2012. For example:
Id |Name |Count |Change
-----|-------------------------|--------|--------
40380|Clark/Lake |56772423| 1378717
41450|Chicago/State |54746133| 1097009
41660|Lake/State |51524924| 1680578
...
41510|Morgan-Lake | 288944| 288944
41680|Oakton-Skokie | 173746| 173746
(d) Re-ordering the data
Modify your program so that it ranks the data according to the total change in ridership. For example:
Id |Name |Count |Change
-----|-------------------------|--------|--------
41400|Roosevelt |33249846| 1817656
41660|Lake/State |51524924| 1680578
40380|Clark/Lake |56772423| 1378717
41220|Fullerton |43086899| 1284827
...
41170|Garfield-Dan Ryan |15741450| -169546
40990|69th |23355328| -310984
40450|95th/Dan Ryan |50304330| -355227
40500|Washington/State |14321124|-2179181
To do this last part, you’ll need to collect and sort the data
according to the change value. You might need to investigate
the key
argument to the sort()
method.