4.3 Built-in Module regex

 

This module provides regular expression matching operations similar to those found in Emacs.

Obsolescence note: This module is obsolete as of Python version 1.5; it is still being maintained because much existing code still uses it. All new code in need of regular expressions should use the new re  module, which supports the more powerful and regular Perl-style regular expressions. Existing code should be converted. The standard library module reconvert  helps in converting regex style regular expressions to re style regular expressions. (For more conversion help, see Andrew Kuchling's  ``regex-to-re HOWTO'' at http://www.python.org/doc/howto/regex-to-re/.)

By default the patterns are Emacs-style regular expressions (with one exception). There is a way to change the syntax to match that of several well-known Unix utilities. The exception is that Emacs' "\s" pattern is not supported, since the original implementation references the Emacs syntax tables.

This module is 8-bit clean: both patterns and strings may contain null bytes and characters whose high bit is set.

Please note: There is a little-known fact about Python string literals which means that you don't usually have to worry about doubling backslashes, even though they are used to escape special characters in string literals as well as in regular expressions. This is because Python doesn't remove backslashes from string literals if they are followed by an unrecognized escape character. However, if you want to include a literal backslash in a regular expression represented as a string literal, you have to quadruple it or enclose it in a singleton character class. E.g. to extract LATEX "\section{ ...}" headers from a document, you can use this pattern: '[\]section{\(.*\)}'. Another exception: the escape sequece "\b" is significant in string literals (where it means the ASCII bell character) as well as in Emacs regular expressions (where it stands for a word boundary), so in order to search for a word boundary, you should use the pattern '\\b'. Similarly, a backslash followed by a digit 0-7 should be doubled to avoid interpretation as an octal escape.


guido@python.org