Edit File by line

[0] Fix | Delete

# Secret Labs' Regular Expression Engine

[1] Fix | Delete

[2] Fix | Delete

# re-compatible interface for the sre matching engine

[3] Fix | Delete

[4] Fix | Delete

[5] Fix | Delete

[6] Fix | Delete

# This version of the SRE library can be redistributed under CNRI's

[7] Fix | Delete

# Python 1.6 license. For any other use, please contact Secret Labs

[8] Fix | Delete

# AB (info@pythonware.com).

[9] Fix | Delete

[10] Fix | Delete

# Portions of this engine have been developed in cooperation with

[11] Fix | Delete

# CNRI. Hewlett-Packard provided funding for 1.6 integration and

[12] Fix | Delete

# other compatibility work.

[13] Fix | Delete

[14] Fix | Delete

[15] Fix | Delete

r"""Support for regular expressions (RE).

[16] Fix | Delete

[17] Fix | Delete

This module provides regular expression matching operations similar to

[18] Fix | Delete

those found in Perl. It supports both 8-bit and Unicode strings; both

[19] Fix | Delete

the pattern and the strings being processed can contain null bytes and

[20] Fix | Delete

characters outside the US ASCII range.

[21] Fix | Delete

[22] Fix | Delete

Regular expressions can contain both special and ordinary characters.

[23] Fix | Delete

Most ordinary characters, like "A", "a", or "0", are the simplest

[24] Fix | Delete

regular expressions; they simply match themselves. You can

[25] Fix | Delete

concatenate ordinary characters, so last matches the string 'last'.

[26] Fix | Delete

[27] Fix | Delete

The special characters are:

[28] Fix | Delete

"." Matches any character except a newline.

[29] Fix | Delete

"^" Matches the start of the string.

[30] Fix | Delete

"$" Matches the end of the string or just before the newline at

[31] Fix | Delete

the end of the string.

[32] Fix | Delete

"*" Matches 0 or more (greedy) repetitions of the preceding RE.

[33] Fix | Delete

Greedy means that it will match as many repetitions as possible.

[34] Fix | Delete

"+" Matches 1 or more (greedy) repetitions of the preceding RE.

[35] Fix | Delete

"?" Matches 0 or 1 (greedy) of the preceding RE.

[36] Fix | Delete

*?,+?,?? Non-greedy versions of the previous three special characters.

[37] Fix | Delete

{m,n} Matches from m to n repetitions of the preceding RE.

[38] Fix | Delete

{m,n}? Non-greedy version of the above.

[39] Fix | Delete

"\\" Either escapes special characters or signals a special sequence.

[40] Fix | Delete

[] Indicates a set of characters.

[41] Fix | Delete

A "^" as the first character indicates a complementing set.

[42] Fix | Delete

"|" A|B, creates an RE that will match either A or B.

[43] Fix | Delete

(...) Matches the RE inside the parentheses.

[44] Fix | Delete

The contents can be retrieved or matched later in the string.

[45] Fix | Delete

(?aiLmsux) The letters set the corresponding flags defined below.

[46] Fix | Delete

(?:...) Non-grouping version of regular parentheses.

[47] Fix | Delete

(?P<name>...) The substring matched by the group is accessible by name.

[48] Fix | Delete

(?P=name) Matches the text matched earlier by the group named name.

[49] Fix | Delete

(?#...) A comment; ignored.

[50] Fix | Delete

(?=...) Matches if ... matches next, but doesn't consume the string.

[51] Fix | Delete

(?!...) Matches if ... doesn't match next.

[52] Fix | Delete

(?<=...) Matches if preceded by ... (must be fixed length).

[53] Fix | Delete

(?<!...) Matches if not preceded by ... (must be fixed length).

[54] Fix | Delete

(?(id/name)yes|no) Matches yes pattern if the group with id/name matched,

[55] Fix | Delete

the (optional) no pattern otherwise.

[56] Fix | Delete

[57] Fix | Delete

The special sequences consist of "\\" and a character from the list

[58] Fix | Delete

below. If the ordinary character is not on the list, then the

[59] Fix | Delete

resulting RE will match the second character.

[60] Fix | Delete

\number Matches the contents of the group of the same number.

[61] Fix | Delete

\A Matches only at the start of the string.

[62] Fix | Delete

\Z Matches only at the end of the string.

[63] Fix | Delete

\b Matches the empty string, but only at the start or end of a word.

[64] Fix | Delete

\B Matches the empty string, but not at the start or end of a word.

[65] Fix | Delete

\d Matches any decimal digit; equivalent to the set [0-9] in

[66] Fix | Delete

bytes patterns or string patterns with the ASCII flag.

[67] Fix | Delete

In string patterns without the ASCII flag, it will match the whole

[68] Fix | Delete

range of Unicode digits.

[69] Fix | Delete

\D Matches any non-digit character; equivalent to [^\d].

[70] Fix | Delete

\s Matches any whitespace character; equivalent to [ \t\n\r\f\v] in

[71] Fix | Delete

bytes patterns or string patterns with the ASCII flag.

[72] Fix | Delete

In string patterns without the ASCII flag, it will match the whole

[73] Fix | Delete

range of Unicode whitespace characters.

[74] Fix | Delete

\S Matches any non-whitespace character; equivalent to [^\s].

[75] Fix | Delete

\w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]

[76] Fix | Delete

in bytes patterns or string patterns with the ASCII flag.

[77] Fix | Delete

In string patterns without the ASCII flag, it will match the

[78] Fix | Delete

range of Unicode alphanumeric characters (letters plus digits

[79] Fix | Delete

plus underscore).

[80] Fix | Delete

With LOCALE, it will match the set [0-9_] plus characters defined

[81] Fix | Delete

as letters for the current locale.

[82] Fix | Delete

\W Matches the complement of \w.

[83] Fix | Delete

\\ Matches a literal backslash.

[84] Fix | Delete

[85] Fix | Delete

This module exports the following functions:

[86] Fix | Delete

match Match a regular expression pattern to the beginning of a string.

[87] Fix | Delete

fullmatch Match a regular expression pattern to all of a string.

[88] Fix | Delete

search Search a string for the presence of a pattern.

[89] Fix | Delete

sub Substitute occurrences of a pattern found in a string.

[90] Fix | Delete

subn Same as sub, but also return the number of substitutions made.

[91] Fix | Delete

split Split a string by the occurrences of a pattern.

[92] Fix | Delete

findall Find all occurrences of a pattern in a string.

[93] Fix | Delete

finditer Return an iterator yielding a Match object for each match.

[94] Fix | Delete

compile Compile a pattern into a Pattern object.

[95] Fix | Delete

purge Clear the regular expression cache.

[96] Fix | Delete

escape Backslash all non-alphanumerics in a string.

[97] Fix | Delete

[98] Fix | Delete

Each function other than purge and escape can take an optional 'flags' argument

[99] Fix | Delete

consisting of one or more of the following module constants, joined by "|".

[100] Fix | Delete

A, L, and U are mutually exclusive.

[101] Fix | Delete

A ASCII For string patterns, make \w, \W, \b, \B, \d, \D

[102] Fix | Delete

match the corresponding ASCII character categories

[103] Fix | Delete

(rather than the whole Unicode categories, which is the

[104] Fix | Delete

default).

[105] Fix | Delete

For bytes patterns, this flag is the only available

[106] Fix | Delete

behaviour and needn't be specified.

[107] Fix | Delete

I IGNORECASE Perform case-insensitive matching.

[108] Fix | Delete

L LOCALE Make \w, \W, \b, \B, dependent on the current locale.

[109] Fix | Delete

M MULTILINE "^" matches the beginning of lines (after a newline)

[110] Fix | Delete

as well as the string.

[111] Fix | Delete

"$" matches the end of lines (before a newline) as well

[112] Fix | Delete

as the end of the string.

[113] Fix | Delete

S DOTALL "." matches any character at all, including the newline.

[114] Fix | Delete

X VERBOSE Ignore whitespace and comments for nicer looking RE's.

[115] Fix | Delete

U UNICODE For compatibility only. Ignored for string patterns (it

[116] Fix | Delete

is the default), and forbidden for bytes patterns.

[117] Fix | Delete

[118] Fix | Delete

This module also defines an exception 'error'.

[119] Fix | Delete

[120] Fix | Delete

"""

[121] Fix | Delete

[122] Fix | Delete

import enum

[123] Fix | Delete

import sre_compile

[124] Fix | Delete

import sre_parse

[125] Fix | Delete

import functools

[126] Fix | Delete

try:

[127] Fix | Delete

import _locale

[128] Fix | Delete

except ImportError:

[129] Fix | Delete

_locale = None

[130] Fix | Delete

[131] Fix | Delete

[132] Fix | Delete

# public symbols

[133] Fix | Delete

__all__ = [

[134] Fix | Delete

"match", "fullmatch", "search", "sub", "subn", "split",

[135] Fix | Delete

"findall", "finditer", "compile", "purge", "template", "escape",

[136] Fix | Delete

"error", "Pattern", "Match", "A", "I", "L", "M", "S", "X", "U",

[137] Fix | Delete

"ASCII", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE",

[138] Fix | Delete

"UNICODE",

[139] Fix | Delete

]

[140] Fix | Delete

[141] Fix | Delete

__version__ = "2.2.1"

[142] Fix | Delete

[143] Fix | Delete

class RegexFlag(enum.IntFlag):

[144] Fix | Delete

ASCII = A = sre_compile.SRE_FLAG_ASCII # assume ascii "locale"

[145] Fix | Delete

IGNORECASE = I = sre_compile.SRE_FLAG_IGNORECASE # ignore case

[146] Fix | Delete

LOCALE = L = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale

[147] Fix | Delete

UNICODE = U = sre_compile.SRE_FLAG_UNICODE # assume unicode "locale"

[148] Fix | Delete

MULTILINE = M = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline

[149] Fix | Delete

DOTALL = S = sre_compile.SRE_FLAG_DOTALL # make dot match newline

[150] Fix | Delete

VERBOSE = X = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments

[151] Fix | Delete

# sre extensions (experimental, don't rely on these)

[152] Fix | Delete

TEMPLATE = T = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking

[153] Fix | Delete

DEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation

[154] Fix | Delete

[155] Fix | Delete

def __repr__(self):

[156] Fix | Delete

if self._name_ is not None:

[157] Fix | Delete

return f're.{self._name_}'

[158] Fix | Delete

value = self._value_

[159] Fix | Delete

members = []

[160] Fix | Delete

negative = value < 0

[161] Fix | Delete

if negative:

[162] Fix | Delete

value = ~value

[163] Fix | Delete

for m in self.__class__:

[164] Fix | Delete

if value & m._value_:

[165] Fix | Delete

value &= ~m._value_

[166] Fix | Delete

members.append(f're.{m._name_}')

[167] Fix | Delete

if value:

[168] Fix | Delete

members.append(hex(value))

[169] Fix | Delete

res = '|'.join(members)

[170] Fix | Delete

if negative:

[171] Fix | Delete

if len(members) > 1:

[172] Fix | Delete

res = f'~({res})'

[173] Fix | Delete

else:

[174] Fix | Delete

res = f'~{res}'

[175] Fix | Delete

return res

[176] Fix | Delete

__str__ = object.__str__

[177] Fix | Delete

[178] Fix | Delete

globals().update(RegexFlag.__members__)

[179] Fix | Delete

[180] Fix | Delete

# sre exception

[181] Fix | Delete

error = sre_compile.error

[182] Fix | Delete

[183] Fix | Delete

# --------------------------------------------------------------------

[184] Fix | Delete

# public interface

[185] Fix | Delete

[186] Fix | Delete

def match(pattern, string, flags=0):

[187] Fix | Delete

"""Try to apply the pattern at the start of the string, returning

[188] Fix | Delete

a Match object, or None if no match was found."""

[189] Fix | Delete

return _compile(pattern, flags).match(string)

[190] Fix | Delete

[191] Fix | Delete

def fullmatch(pattern, string, flags=0):

[192] Fix | Delete

"""Try to apply the pattern to all of the string, returning

[193] Fix | Delete

a Match object, or None if no match was found."""

[194] Fix | Delete

return _compile(pattern, flags).fullmatch(string)

[195] Fix | Delete

[196] Fix | Delete

def search(pattern, string, flags=0):

[197] Fix | Delete

"""Scan through string looking for a match to the pattern, returning

[198] Fix | Delete

a Match object, or None if no match was found."""

[199] Fix | Delete

return _compile(pattern, flags).search(string)

[200] Fix | Delete

[201] Fix | Delete

def sub(pattern, repl, string, count=0, flags=0):

[202] Fix | Delete

"""Return the string obtained by replacing the leftmost

[203] Fix | Delete

non-overlapping occurrences of the pattern in string by the

[204] Fix | Delete

replacement repl. repl can be either a string or a callable;

[205] Fix | Delete

if a string, backslash escapes in it are processed. If it is

[206] Fix | Delete

a callable, it's passed the Match object and must return

[207] Fix | Delete

a replacement string to be used."""

[208] Fix | Delete

return _compile(pattern, flags).sub(repl, string, count)

[209] Fix | Delete

[210] Fix | Delete

def subn(pattern, repl, string, count=0, flags=0):

[211] Fix | Delete

"""Return a 2-tuple containing (new_string, number).

[212] Fix | Delete

new_string is the string obtained by replacing the leftmost

[213] Fix | Delete

non-overlapping occurrences of the pattern in the source

[214] Fix | Delete

string by the replacement repl. number is the number of

[215] Fix | Delete

substitutions that were made. repl can be either a string or a

[216] Fix | Delete

callable; if a string, backslash escapes in it are processed.

[217] Fix | Delete

If it is a callable, it's passed the Match object and must

[218] Fix | Delete

return a replacement string to be used."""

[219] Fix | Delete

return _compile(pattern, flags).subn(repl, string, count)

[220] Fix | Delete

[221] Fix | Delete

def split(pattern, string, maxsplit=0, flags=0):

[222] Fix | Delete

"""Split the source string by the occurrences of the pattern,

[223] Fix | Delete

returning a list containing the resulting substrings. If

[224] Fix | Delete

capturing parentheses are used in pattern, then the text of all

[225] Fix | Delete

groups in the pattern are also returned as part of the resulting

[226] Fix | Delete

list. If maxsplit is nonzero, at most maxsplit splits occur,

[227] Fix | Delete

and the remainder of the string is returned as the final element

[228] Fix | Delete

of the list."""

[229] Fix | Delete

return _compile(pattern, flags).split(string, maxsplit)

[230] Fix | Delete

[231] Fix | Delete

def findall(pattern, string, flags=0):

[232] Fix | Delete

"""Return a list of all non-overlapping matches in the string.

[233] Fix | Delete

[234] Fix | Delete

If one or more capturing groups are present in the pattern, return

[235] Fix | Delete

a list of groups; this will be a list of tuples if the pattern

[236] Fix | Delete

has more than one group.

[237] Fix | Delete

[238] Fix | Delete

Empty matches are included in the result."""

[239] Fix | Delete

return _compile(pattern, flags).findall(string)

[240] Fix | Delete

[241] Fix | Delete

def finditer(pattern, string, flags=0):

[242] Fix | Delete

"""Return an iterator over all non-overlapping matches in the

[243] Fix | Delete

string. For each match, the iterator returns a Match object.

[244] Fix | Delete

[245] Fix | Delete

Empty matches are included in the result."""

[246] Fix | Delete

return _compile(pattern, flags).finditer(string)

[247] Fix | Delete

[248] Fix | Delete

def compile(pattern, flags=0):

[249] Fix | Delete

"Compile a regular expression pattern, returning a Pattern object."

[250] Fix | Delete

return _compile(pattern, flags)

[251] Fix | Delete

[252] Fix | Delete

def purge():

[253] Fix | Delete

"Clear the regular expression caches"

[254] Fix | Delete

_cache.clear()

[255] Fix | Delete

_compile_repl.cache_clear()

[256] Fix | Delete

[257] Fix | Delete

def template(pattern, flags=0):

[258] Fix | Delete

"Compile a template pattern, returning a Pattern object"

[259] Fix | Delete

return _compile(pattern, flags|T)

[260] Fix | Delete

[261] Fix | Delete

# SPECIAL_CHARS

[262] Fix | Delete

# closing ')', '}' and ']'

[263] Fix | Delete

# '-' (a range in character set)

[264] Fix | Delete

# '&', '~', (extended character set operations)

[265] Fix | Delete

# '#' (comment) and WHITESPACE (ignored) in verbose mode

[266] Fix | Delete

_special_chars_map = {i: '\\' + chr(i) for i in b'()[]{}?*+-|^$\\.&~# \t\n\r\v\f'}

[267] Fix | Delete

[268] Fix | Delete

def escape(pattern):

[269] Fix | Delete

"""

[270] Fix | Delete

Escape special characters in a string.

[271] Fix | Delete

"""

[272] Fix | Delete

if isinstance(pattern, str):

[273] Fix | Delete

return pattern.translate(_special_chars_map)

[274] Fix | Delete

else:

[275] Fix | Delete

pattern = str(pattern, 'latin1')

[276] Fix | Delete

return pattern.translate(_special_chars_map).encode('latin1')

[277] Fix | Delete

[278] Fix | Delete

Pattern = type(sre_compile.compile('', 0))

[279] Fix | Delete

Match = type(sre_compile.compile('', 0).match(''))

[280] Fix | Delete

[281] Fix | Delete

# --------------------------------------------------------------------

[282] Fix | Delete

# internals

[283] Fix | Delete

[284] Fix | Delete

_cache = {} # ordered!

[285] Fix | Delete

[286] Fix | Delete

_MAXCACHE = 512

[287] Fix | Delete

def _compile(pattern, flags):

[288] Fix | Delete

# internal: compile pattern

[289] Fix | Delete

if isinstance(flags, RegexFlag):

[290] Fix | Delete

flags = flags.value

[291] Fix | Delete

try:

[292] Fix | Delete

return _cache[type(pattern), pattern, flags]

[293] Fix | Delete

except KeyError:

[294] Fix | Delete

pass

[295] Fix | Delete

if isinstance(pattern, Pattern):

[296] Fix | Delete

if flags:

[297] Fix | Delete

raise ValueError(

[298] Fix | Delete

"cannot process flags argument with a compiled pattern")

[299] Fix | Delete

return pattern

[300] Fix | Delete

if not sre_compile.isstring(pattern):

[301] Fix | Delete

raise TypeError("first argument must be string or compiled pattern")

[302] Fix | Delete

p = sre_compile.compile(pattern, flags)

[303] Fix | Delete

if not (flags & DEBUG):

[304] Fix | Delete

if len(_cache) >= _MAXCACHE:

[305] Fix | Delete

# Drop the oldest item

[306] Fix | Delete

try:

[307] Fix | Delete

del _cache[next(iter(_cache))]

[308] Fix | Delete

except (StopIteration, RuntimeError, KeyError):

[309] Fix | Delete

pass

[310] Fix | Delete

_cache[type(pattern), pattern, flags] = p

[311] Fix | Delete

return p

[312] Fix | Delete

[313] Fix | Delete

@functools.lru_cache(_MAXCACHE)

[314] Fix | Delete

def _compile_repl(repl, pattern):

[315] Fix | Delete

# internal: compile replacement pattern

[316] Fix | Delete

return sre_parse.parse_template(repl, pattern)

[317] Fix | Delete

[318] Fix | Delete

def _expand(pattern, match, template):

[319] Fix | Delete

# internal: Match.expand implementation hook

[320] Fix | Delete

template = sre_parse.parse_template(template, pattern)

[321] Fix | Delete

return sre_parse.expand_template(template, match)

[322] Fix | Delete

[323] Fix | Delete

def _subx(pattern, template):

[324] Fix | Delete

# internal: Pattern.sub/subn implementation helper

[325] Fix | Delete

template = _compile_repl(template, pattern)

[326] Fix | Delete

if not template[0] and len(template[1]) == 1:

[327] Fix | Delete

# literal replacement

[328] Fix | Delete

return template[1][0]

[329] Fix | Delete

def filter(match, template=template):

[330] Fix | Delete

return sre_parse.expand_template(template, match)

[331] Fix | Delete

return filter

[332] Fix | Delete

[333] Fix | Delete

# register myself for pickling

[334] Fix | Delete

[335] Fix | Delete

import copyreg

[336] Fix | Delete

[337] Fix | Delete

def _pickle(p):

[338] Fix | Delete

return _compile, (p.pattern, p.flags)

[339] Fix | Delete

[340] Fix | Delete

copyreg.pickle(Pattern, _pickle, _compile)

[341] Fix | Delete

[342] Fix | Delete

# --------------------------------------------------------------------

[343] Fix | Delete

# experimental stuff (see python-dev discussions for details)

[344] Fix | Delete

[345] Fix | Delete

class Scanner:

[346] Fix | Delete

def __init__(self, lexicon, flags=0):

[347] Fix | Delete

from sre_constants import BRANCH, SUBPATTERN

[348] Fix | Delete

if isinstance(flags, RegexFlag):

[349] Fix | Delete

flags = flags.value

[350] Fix | Delete

self.lexicon = lexicon

[351] Fix | Delete

# combine phrases into a compound pattern

[352] Fix | Delete

p = []

[353] Fix | Delete

s = sre_parse.State()

[354] Fix | Delete

s.flags = flags

[355] Fix | Delete

for phrase, action in lexicon:

[356] Fix | Delete

gid = s.opengroup()

[357] Fix | Delete

p.append(sre_parse.SubPattern(s, [

[358] Fix | Delete

(SUBPATTERN, (gid, 0, 0, sre_parse.parse(phrase, flags))),

[359] Fix | Delete

]))

[360] Fix | Delete

s.closegroup(gid, p[-1])

[361] Fix | Delete

p = sre_parse.SubPattern(s, [(BRANCH, (None, p))])

[362] Fix | Delete

self.scanner = sre_compile.compile(p)

[363] Fix | Delete

def scan(self, string):

[364] Fix | Delete

result = []

[365] Fix | Delete

append = result.append

[366] Fix | Delete

match = self.scanner.scanner(string).match

[367] Fix | Delete

i = 0

[368] Fix | Delete

while True:

[369] Fix | Delete

m = match()

[370] Fix | Delete

if not m:

[371] Fix | Delete

break

[372] Fix | Delete

j = m.end()

[373] Fix | Delete

if i == j:

[374] Fix | Delete

break

[375] Fix | Delete

action = self.lexicon[m.lastindex-1][1]

[376] Fix | Delete

if callable(action):

[377] Fix | Delete

self.match = m

[378] Fix | Delete

action = action(self, m.group())

[379] Fix | Delete

if action is not None:

[380] Fix | Delete

append(action)

[381] Fix | Delete

i = j

[382] Fix | Delete

return result, string[i:]

[383] Fix | Delete

[384] Fix | Delete