RegularExpression
Python Regular Expression[re]
EN version
Syntax | Description | Character | Expression Example | Full Match Example | |
---|
General Characters | | | | | |
Match itself | Matches the character itself | abc | abc | abc | |
. | Matches any character except a newline \n . In DOTALL mode, it can also match newlines. | a.c | a.c | abc | |
\ | Escape character, makes the following character literal. Use \ to match special characters. | a\.c | a\.c | a.c | |
[ ... ] | Character set (character class). Matches any character in the set. Can specify characters or ranges. | a[bcd]e | a[bcd]e | abe , ace , ade | |
| Special characters lose their meaning in character sets. Escape - , ] , ^ if needed. | [a-c] | [^abc] | Any character except abc | |
Predefined Character Sets (can also be used within [ ... ] ) | | | | | |
\d | Matches a digit [0-9] | a\dc | a\dc | a1c , a3c | |
\D | Matches a non-digit [^0-9] | a\Dc | a\Dc | abc , axc | |
\s | Matches a whitespace character [ \t\r\n\f\v] | a\sc | a\sc | a c , a\tc | |
\S | Matches a non-whitespace character [^ \t\r\n\f\v] | a\Sc | a\Sc | abc , axc | |
\w | Matches a word character [A-Za-z0-9_] | a\wc | a\wc | abc , a1c | |
\W | Matches a non-word character [^A-Za-z0-9_] | a\Wc | a\Wc | a@c | |
Quantifiers (used after a character or ( ... ) ) | | | | | |
* | Matches the preceding character 0 or more times | abc* | abc* | ab , abc , abcc | |
+ | Matches the preceding character 1 or more times | abc+ | abc+ | abc , abcc | |
? | Matches the preceding character 0 or 1 time | abc? | abc? | ab , abc | |
{m} | Matches the preceding character exactly m times | ab{2}c | ab{2}c | abbc | |
{m,n} | Matches the preceding character between m and n times, inclusive | ab{1,2}c | ab{1,2}c | abc , abbc | |
*? +? ?? | Makes * + ? {m,n} non-greedy | a.*? | See examples below | Non-greedy match | |
Boundary Matchers (do not consume characters in the string) | | | | | |
^ | Matches the beginning of the string. In multiline mode, matches the start of each line. | ^abc | ^abc | abc | |
$ | Matches the end of the string. In multiline mode, matches the end of each line. | abc$ | abc$ | abc | |
\A | Matches only at the start of the string | \Aabc | \Aabc | abc | |
\Z | Matches only at the end of the string | abc\Z | abc\Z | abc | |
\b | Matches a word boundary | \babc\b | \babc\b | abc | |
\B | Matches a non-word boundary | a\Bbc | a\Bbc | a@bc | |
Logical and Grouping | | | | | |
` | ` | Matches either the expression on the left or right. Left side is tried first. | abc | def | abc|def | abc , def |
( ... ) | Groups expressions and assigns a number from 1 onwards for backreferences | (abc){2} | (abc){2} | abcabc | |
(?P<name> ... ) | Named group, provides an additional name for backreferences | | | | |
\<number> | Refers to the group with the given number in backreference | (\d)abc\1 | (\d)abc\1 | 1abc1 | |
(?P=name) | Refers to the named group in backreference | | | | |
Special Constructs (not counted as a group) | | | | | |
(?: ... ) | Non-capturing group | | | | |
(?iLmsux) | Mode modifiers to alter matching behavior, can specify multiple | (?i)abc | (?i)abc | Abc , ABC | |
(?# ... ) | Inline comment, ignored by the regex engine | abc(?#comment)123 | abc(?#comment)123 | abc123 | |
(?= ... ) | Positive lookahead, does not consume characters | a(?=\d) | a(?=\d) | a1 , a2 | |
(?! ... ) | Negative lookahead, does not consume characters | a(?!\d) | a(?!\d) | a , ab | |
(?<= ... ) | Positive lookbehind | (?<=\d)a | (?<=\d)a | 1a , 2a | |
(?<! ... ) | Negative lookbehind | (?<!\d)a | (?<!\d)a | a | |
(?(id/name)yes-pattern | no-pattern) | Conditional matching, matches yes-pattern if the group matches, else no-pattern | (\d)abc(?(1)\d | abc) | 1abc1 , abcabc | | |
CN version
语法 | 说明 | 字符 | 表达式实例 | 完整匹配的字符串 | |
---|
一般字符 | | | | | |
匹配自身 | 匹配自身字符 | abc | abc | abc | |
. | 匹配任意除换行符\n 外的字符。在DOTALL模式中也能匹配换行符。 | a.c | a.c | abc | |
\ | 转义字符,使后一个字符恢复原意。如字符串中有特殊字符要匹配时,可用 \ 转义 | a\.c | a\.c | a.c | |
[ ... ] | 字符集(字符类),对应位置可以是字符集中任意字符。字符集可以列出字符或用范围表示。 | a[bcd]e | a[bcd]e | abe , ace , ade | |
| 特殊字符在字符集中失去原有含义。字符集中如需 - 、] 、^ ,需转义或调整位置。 | [a-c] | [^abc] | 除abc 以外的字符 | |
预定义字符集 (可以写在字符集[ ... ] 中) | | | | | |
\d | 匹配数字 [0-9] | a\dc | a\dc | a1c , a3c | |
\D | 匹配非数字 [^0-9] | a\Dc | a\Dc | abc , axc | |
\s | 匹配空白字符 [ \t\r\n\f\v] | a\sc | a\sc | a c , a\tc | |
\S | 匹配非空白字符 [^ \t\r\n\f\v] | a\Sc | a\Sc | abc , axc | |
\w | 匹配单词字符 [A-Za-z0-9_] | a\wc | a\wc | abc , a1c | |
\W | 匹配非单词字符 [^A-Za-z0-9_] | a\Wc | a\Wc | a@c | |
数量词 (用在字符或(...) 之后) | | | | | |
* | 匹配前一个字符 0 次或多次 | abc* | abc* | ab , abc , abcc | |
+ | 匹配前一个字符 1 次或多次 | abc+ | abc+ | abc , abcc | |
? | 匹配前一个字符 0 次或 1 次 | abc? | abc? | ab , abc | |
{m} | 匹配前一个字符 m 次 | ab{2}c | ab{2}c | abbc | |
{m,n} | 匹配前一个字符 m 至 n 次,m 或 n 可省略 | ab{1,2}c | ab{1,2}c | abc , abbc | |
*? +? ?? | 使 * + ? {m,n} 变成非贪婪模式 | a.*? | 示例见下文 | 非贪婪匹配 | |
边界匹配 (不消耗待匹配字符串中的字符) | | | | | |
^ | 匹配字符串开头,在多行模式中匹配每一行的开头 | ^abc | ^abc | abc | |
$ | 匹配字符串末尾,在多行模式中匹配每一行的末尾 | abc$ | abc$ | abc | |
\A | 仅匹配字符串开头 | \Aabc | \Aabc | abc | |
\Z | 仅匹配字符串末尾 | abc\Z | abc\Z | abc | |
\b | 匹配单词边界 | \babc\b | \babc\b | abc | |
\B | 匹配非单词边界 | a\Bbc | a\Bbc | a@bc | |
逻辑、分组 | | | | | |
` | ` | 代表左右表达式任意匹配一个,左表达式优先 | abc | def | abc|def | abc , def |
( ... ) | 分组,编号从1开始,可使用编号引用 | (abc){2} | (abc){2} | abcabc | |
(?P<name> ...) | 分组,除编号外再指定别名 | | | | |
\<number> | 引用编号为的分组匹配到的字符串 | (\d)abc\1 | (\d)abc\1 | 1abc1 | |
(?P=name) | 引用别名为的分组匹配到的字符串 | | | | |
特殊构造 (不作为分组) | | | | | |
(?: ...) | 非捕获性分组 | | | | |
(?iLmsux) | 模式修饰符,用于调整匹配模式 | (?i)abc | (?i)abc | Abc , ABC | |
(?# ...) | 注释,# 后的内容会被忽略 | abc(?# comment)123 | abc123 | abc123 | |
(?= ...) | 正向预查,不消耗字符串内容 | a(?=\d) | a(?=\d) | a1 , a2 | |
(?! ...) | 负向预查,不消耗字符串内容 | a(?!\d) | a(?!\d) | a , ab | |
(?<= ...) | 正向后发断言 | (?<=\d)a | (?<=\d)a | 1a , 2a | |
(?<! ...) | 负向后发断言 | (?<!\d)a | (?<!\d)a | a | |
(?(id/name)yes-pattern | no-pattern) | 条件判断,若编号或别名组匹配到字符则匹配 yes-pattern ,否则 no-pattern | (\d)abc(?(1)\d | abc) | 1abc1 , abcabc | | |