Speak and Shout

Tuesday, December 18, 2007

Something I hate about Python's re module

UPDATE: I've gotten a couple comments about this, and both seem to illustrate (with varying degrees of tact) that my complaint wasn't all that clear. My problem with the re module is this: Python's documentation for the re.match function is match(pattern, string, [flags]) where pattern can be either a regex string or a compiled regex object. If it's a compiled regex object, then supplying an optional flag to re.match (in my case, re.IGNORECASE) doesn't work and, more to the point, fails silently. I think this should throw an exception if it's not going to work. However, I really think that it should work the way I've illustrated, because, IMO, it's the most natural way to use the API.

----
I've been burned at least three separate times by the following problem: I'll start out with a simple uncompiled regex for testing and then switch over to a compiled regex. Suddenly, the whole thing stops working.

Here's an example of what I'm doing below. (Example taken from O'Reilly's Regular Expression Pocket Reference by Tony Stubblebine.)

import re
dailybugle = r'Spider-Man Menaces City!'
pattern = r'spider[- ]?man.'
if re.match(pattern, dailybugle, re.IGNORECASE):
    print dailybugle

This prints out 'Spider-Man Menaces City!' as expected. So now I want to compile the regular expression now for speed. I change the code to look like this:

import re
dailybugle = r'Spider-Man Menaces City!'
pattern = re.compile(r'spider[- ]?man.')
if re.match(pattern, dailybugle, re.IGNORECASE):
    print dailybugle


Looks simple, right? I just surrounded the pattern string with a call to re.compile(). Unfortunately, the whole thing now quietly fails. What the ... ? Take the re.compile out, it starts working again.



The solution is to move the re.IGNORECASE flag into the re.compile call, like so:

import re
dailybugle = r'Spider-Man Menaces City!'
pattern = re.compile(r'spider[- ]?man.', re.IGNORECASE)
if re.match(pattern, dailybugle):
    print dailybugle


In my opinion, this solution is very unintuitive and requires more rejiggering of the code than it should. But even worse is that in my first attempt to use a compiled regex, re.match can receive a re.IGNORECASE flag that it subsequently disregards. This type of call should throw an exception, in my opinion.

Anyone know a reason for this bad (and seemingly buggy) behavior?

Labels: ,