Exercise 4.3

Objectives:

  • Simple regular expression pattern matching.

Files Created: None

Files Modified: None

(a) Simple Regular Expression Pattern Matching

In this exercise, we experiment with regular expression pattern matching. First, define the string:

>>> text = "Guido was out of the office from 12/14/2012 to 1/3/2013."
>>>

Now, find all of the matching dates:

>>> import re
>>> dates = re.findall(r'(\d+)/(\d+)/(\d+)', text)
>>> dates
[('12','14','2012'), ('1', '3', '2013')]
>>>

Replace the dates with a different format:

>>> newtext = re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text)
>>> newtext
'Guido was out of the office from 2012-12-14 to 2013-1-3.'
>>>

(b) Splitting text on multiple delimiters

Consider the following text containing a timestamp.

>>> text = 'Fri Jan 17 12:22:52 CST 2014'
>>>

Suppose you wanted to split the date into parts by splitting the string on space and colon (:) characters. Try this:

>>> parts = ['weekday', 'month', 'day', 'hour', 'minute', 'second', 'timezone', 'year']
>>> d = dict(zip(parts, re.split(r'[ :]', text)))
>>> d
{'weekday': 'Fri', 'hour': '12', 'month': 'Jan', 'second': '52', 'year': '2014', 'timezone': 'CST', 'day': '17', 'minute': '22'}
>>>

(c) Number conversion

Consider the following list of strings:

>>> vals = ['1','2','-','3','N/A','4','-5','+6']
>>>

Now, suppose you wanted to convert all of the values into integers using a list comprehension. Sadly, it doesn’t work:

>>> data = [int(val) for val in vals]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '-'
>>>

To avoid this, you might try putting some sort of simple filter on it. For example:

>>> data = [(int(val) if val.isdigit() else None) for val in vals]
>>> data
[1, 2, None, 3, None, 4, None, None]
>>>

Sadly, that didn’t work either because perfectly valid values of -5 and +6 didn’t get converted. Fortunately, you can wield the awesome power of a regex here. Try this:

>>> data = [(int(val) if re.match(r'[+-]?\d+$', val) else None) for val in vals]
>>> data
[1, 2, None, 3, None, 4, -5, 6]
>>>

Ah yes, mixing list comprehensions and regular expressions together at the same time—at the very least you have slightly increased your job security.

Links