python re模块入门

简介

re模块是python独有的匹配字符串的模块，该模块中提供的很多功能是基于正则表达式实现的，正则表达式是所有语言通用的。

pattern

在介绍re模块的方法之前，我们先来介绍一下pattern的概念。
pattern可以理解为一个匹配模式，那么我们怎么获得这个匹配模式呢？很简单，我们需要利用re.compile方法就可以。例如

pattern = re.compile(r'hello')

使用pattern即能提高代码的清晰度，又能避免因正则表达式的重复编码而影响代码运行的速度。

flags

参数flag是匹配模式，取值可以使用按位或运算符’|’表示同时生效，比如re.I | re.M。
可选值有：

re.I(全拼：IGNORECASE): 忽略大小写（括号内是完整写法，下同）
re.M(全拼：MULTILINE): 多行模式，改变’^‘和’$'的行为（参见上图）
re.S(全拼：DOTALL): 点任意匹配模式，改变’.'的行为
re.L(全拼：LOCALE): 使预定字符类 \w \W \b \B \s \S 取决于当前区域设定
re.U(全拼：UNICODE): 使预定字符类 \w \W \b \B \s \S \d \D 取决于unicode定义的字符属性
re.X(全拼：VERBOSE): 详细模式。这个模式下正则表达式可以是多行，忽略空白字符，并可以加入注释

功能函数介绍

re.match()

从开始位置0开始匹配，匹配到不符合或结束即返回

用法

re.match(pattern, string[, flags])

示例

pattern = re.compile(r'\w+')
res = re.match(pattern, 'hello, hello world')
print(res.group())

结果

hello

re.search()

和re.match()类似，不限定从开始位置进行匹配

用法

re.search(pattern, string[, flags])

示例

pattern = re.compile(r'\w+')
res = re.match(pattern, '$hello, hello world') // 使用match匹配不到
print(res.group())

结果

hello

re.findall()

搜索字符串，返回所有匹配的字串，返回列表

用法

re.findall(pattern, string[, flags])

示例

pattern = re.compile(r'\d+')
res = re.findall(pattern,'one1two2three3four4')
print(res)

结果

['1', '2', '3', '4']

re.finditer()

和re.findall()类似，返回Match对象

用法

re.finditer(pattern, string[, flags])

示例

pattern = re.compile(r'\d+')
res = re.finditer(pattern,'one1two2three3four4')
print(res.group())

结果

1 2 3 4

re.sub()

使用匹配到的字串拼接一个新字符串，返回字符串

用法

re.sub(pattern, repl, string[, count])
repl是字符串时，可以使用\id或\g、\g引用分组，但不能使用编号0
repl是函数时，这个方法应当只接受一个参数（Match对象），并返回一个字符串用于替换（返回的字符串中不能再引用分组）

示例

pattern = re.compile(r'(\w+) (\w+)')
res = re.sub(pattern,r'\2 \1', 'i say, hello world!')
print(res)

结果

say i, world hello!

re.split()

使用匹配的字串将字符串分割，返回列表

用法

re.split(pattern, string[, maxsplit])

示例

pattern = re.compile(r'\d+')
res = re.split(pattern,'one1two2three3four4')
print(res)

结果

['one', 'two', 'three', 'four', '']