A regular expression is a special text string for describing a search pattern. Regular expressions are extremely useful in extracting information from text such as code, log files, spreadsheets or even documents.
Một số pattern cơ bản:
abc… – Letters.
123… – Digits.
\d – Any digit.
\D – Any non-digit character.
. – Metacharacter: any character.
\. – Dot character, period.
[abc] – Only a, b or c.
[^abc] – Not a, b nor c.
[0-4] – Only match any single digit character from 0 to 4 (character ranges).
\w – Any alphanumeric character, = [A-Za-z0-9_]
\W – Any non-alphanumeric character, = [^A-Za-z0-9_]
\s – Any whilespace: space ( ), tab (\t), newline (\n), carriage return (\r).
\S – Any non-whitespace.
re{m} – Match the ‘re’ exactly m times.
re{m,n} – Match the ‘re’ from m to n times.
re{m,} – Match the ‘re’ at least m times. ‘re’ may be any character or special metacharacters.
ex: a{3}, [xyz]{5}, .{2,6}
re* – Match 0 or more times of ‘re’.
re+ – Match 1 or more times of ‘re’.
re? – Optional character.
^re – Match only a ‘re’ that at the begin of line.
re$ – Match only a ‘re’ that at the end of line.
^re$ – Matches the whole line completely at the beginning and end.
(re) – Group of characters and capturing. Any subpattern inside a () will be captured as a group for further processing.
ex: ^(IMG\d+)\.png$: capture the filename without the extension(.png)
>> IMG2018.png
group 1: IMG2018
(r(e)) – Nested group: for extract multiple layers of information, which can result in nested groups.
ex: ^(IMG(\d+))\.png$
>> IMG39.png
group 1: IMG39
group 2: 39
re1|re2 – Match ‘re1’ or ‘re2’.
(re1|re2) – Match and group re1 or re2.
re(?=foo) – Lookahead. Match the ‘re’ that is followed by ‘foo’.
ex: \w+(?=\.)
>> hello world.
Full match: world
re(?!foo) – Negative lookahead. Match the ‘re’ that is not followed by ‘foo’.
ex: \w+(?!\.)
>> hello.
Full match: hell
(?<=foo)re – Look behind. Match the ‘re’ that follows the ‘foo’.
ex: (?<=\?)\w+
>>?hello
Full match: hello
(?<!foo)re – Negative look behind. Match the ‘re’ that doesn’t follow the ‘foo’.
ex: (?<!\?)\w+
>>?hello
Full match: ello
(?P<name>re) – Matches any re inside () and delimits a named group.
ex: (?P<id>\w+)
>>Python
Full match: Python
Group ‘id’: Python
(?P=name) – Matches whatever text was matched by the earlier group named name.
ex: (?P<a>\w+) (?P=a)
>>hello hello
Full match: hello hello
Group ‘a’: hello
References:
[1] Learn Regular Expressions with simple, interactive exercises.
[2] Online regex tester and debugger.
One thought on “Regular expression”