Python正则表达式(2)_程序人生

Python正则表达式(2)

admin

2023-07-31 01:48:13

0次

对于一些预定义的字符集可以使用转义码可以更加紧凑的表示，re可以识别的转义码有3对，6个，分别为三个字母的大小写，他们的意义是相反的。

\\d : 一个数字
\\D : 一个非数字
\\w : 字母或者数字
\\W : 非字母，非数字
\\s : 空白符（制表符，空格，换行符等）
\\S : 非空白符

如果想指定匹配的内容在文本的相对位置，可以使用锚定，跟转义码类似。

^ 字符或行的开始
$ 字符或行的结束
\\A 字符串的开始
\\Z 字符串结束
\\b 一个单词开头或者末尾的空串
\\B 不在一个单词开头或末尾的空串

import re
the_str = \"This is some text -- with punctuation\"  
re.search(r\'^\\w+\', the_str).group(0)       # This
re.search(r\'\\A\\w+\', the_str).group(0)      # This  
re.search(r\'\\w+\\S*$\', the_str).group(0)    # punctuation  
re.search(r\'\\w+\\S*\\Z\', the_str).group(0)   # punctuation  
re.search(r\'\\w*t\\W*\', the_str).group(0)    # text --  
re.search(r\'\\bt\\w+\', the_str).group(0)     # text  
re.search(r\'\\Bt*\\B\', the_str).group(0)     # 没有匹配

用组来解析匹配，简单的说就是在一个正则表达式中有几个小括号()将匹配的表达式分成不同的组，使用group()函数来获取某个组的匹配，其中0为整个正则表达式所匹配的内容，后面从1开始从左往右依次获取每个组的匹配，即每个小括号中的匹配。使用groups()可以获取所有的匹配内容。

import re  
the_str = \"--aabb123bbaa\"  
pattern = r\'(\\W+)([a-z]+)(\\d+)(\\D+)\'  
match = re.search(pattern, the_str)    
match.groups()    # (\'--\', \'aabb\', \'123\', \'bbaa\') 
match.group(0)    # \'--aabb123bbaa\'  
match.group(1)    # \'--\'  
match.group(2)    # \'aabb\'  
match.group(3)    # \'123\'  
match.group(4)    # \'bbaa\'

python对分组的语法做了扩展，我们可以对每个分组进行命名，这样便可以使用名称来调用。语法:(?Ppattern),使用groupdict()可以返回一个包含了组名的字典。

import re  
the_str = \"--aabb123bbaa\"  
pattern = r\'(?P\\W+)(?P[a-z]+)(?P\\d+)(?P\\D+)\'  
match = re.search(pattern, the_str)    
match.groups()    # (\'--\', \'aabb\', \'123\', \'bbaa\')  
match.groupdict() # {\'not_al_and_num\': \'--\', \'not_num\': \'bbaa\', \'num\': \'123\', \'al\': \'aabb\'}  
match.group(0)                    # \'--aabb123bbaa\'  
match.group(1)                    # \'--\'  
match.group(2)                    # \'aabb\'  
match.group(3)                    # \'123\'  
match.group(4)                    # \'bbaa\'   
match.group(\'not_al_and_num\')    # \'--\'
match.group(\'al\')                 # \'aabb\'  
match.group(\'num\')               # \'123\' \'
match.group(\'not_num\')            # \'bbaa\'

以上的group()方法在使用的时候需要注意，只有在有匹配的时候才会正常运行，否则会抛错，所以在不能保证有匹配而又要输出匹配结果的时候，必须做校验。

在re中可以设置不通的标志，也就是search()和compile()等中都包含的缺省变量flag。使用标志可以进行完成一些特殊的要求，如忽略大小写，多行搜索等。

import re  
the_str = \"this Text\"  
re.findall(r\'\\bt\\w+\', the_str)   # [\'this\']  
re.findall(r\'\\bt\\w+\', the_str, re.IGNORECASE) # [\'this\', \'Text\']

关于搜索选项有很多，具体可查看文档 http://docs.python.org/2/library/re.html#module-re

python

上一篇：Testify Pythoinc的单元测试框架

下一篇：Python正则表达式(1)

Python正则表达式(2)

相关内容

热门资讯