Exercise 3.3

Objectives:

  • More on error handling and exceptions.

  • How to properly write code that catches exceptions.

Files Created: None.

Files Modified: fileparse.py

(a) Raising exceptions

The parse_csv() function you wrote in the last section allows user-specified columns to be selected, but that only works if the input data file has column headers. Modify the code so that an exception gets raised if both the select and has_headers=False arguments are passed. For example:

>>> parse_csv('Data/prices.csv', select=['name','price'], has_headers=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fileparse.py", line 9, in parse_csv
    raise RuntimeError("select argument requires column headers")
RuntimeError: select argument requires column headers
>>>
Discussion

Having added this one check, you might ask if you should be performing other kinds of sanity checks in the function. For example, should you check that the filename is a string, that types is a list, or anything of that nature? As a general rule, it’s usually best to skip such tests and to just let the program fail on bad inputs. The traceback message will point at the source of the problem and can assist in debugging. The main reason for adding the above check to avoid running the code in a non-sensical mode (e.g., using a feature that requires column headers, but simultaneously specifying that there are no headers). This indicates a programming error on the part of the calling code.

(b) Catching exceptions

The parse_csv() function you wrote is used to process the entire contents of a file. However, in the real-world, it’s possible that input files might have corrupted, missing, or dirty data. Try this experiment:

>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fileparse.py", line 36, in parse_csv
    row = [func(val) for func, val in zip(types, row)]
ValueError: invalid literal for int() with base 10: ''
>>>

Modify the parse_csv() function to catch all ValueError exceptions generated during record creation and print a warning message for rows that can’t be converted. The message should include the row number and information about the reason why it failed.

To test your function, try reading the file Data/missing.csv above. For example:

>>> portfolio = parse_csv('Data/missing.csv', types=[str, int, float])
Row 4: Couldn't convert ['MSFT', '', '51.23']
Row 4: Reason invalid literal for int() with base 10: ''
Row 7: Couldn't convert ['IBM', '', '70.44']
Row 7: Reason invalid literal for int() with base 10: ''
>>>
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
>>>

(c) Silencing Errors

Modify the parse_csv() function so that parsing error messages can be silenced if desired by the user. For example:

>>> portfolio = parse_csv('Data/missing.csv', types=[str,int,float], silence_errors=True)
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}]
>>>
Discussion

Error handling is one of the most difficult things to get right in most programs. As a general rule, you shouldn’t silently ignore errors. Instead, it’s better to report problems and to give the user an option to the silence the error message if they choose to do so.

Links