Programming

Regular expression

A regular expression is a special text string for describing a search pattern. Regular expressions are extremely useful in extracting information from text such as code, log files, spreadsheets or even documents.

Một số pattern cơ bản:

abc…      – Letters.

123…      – Digits.

\d            – Any digit.

\D            – Any non-digit character.

             – Metacharacter: any character.

\.              – Dot character, period.

[abc]       – Only a, b or c.

[^abc]     – Not a, b nor c.

[0-4]        – Only match any single digit character from 0 to 4 (character ranges).

\w            – Any alphanumeric character, = [A-Za-z0-9_]

\W           – Any non-alphanumeric character, = [^A-Za-z0-9_]

\s              – Any whilespace: space ( ), tab (\t), newline (\n), carriage return (\r).

\S             – Any non-whitespace.

re{m}     – Match the ‘re’ exactly m times.

re{m,n}  – Match the ‘re’ from m to n times.

re{m,}     – Match the ‘re’ at least m times. ‘re’ may be any character or special metacharacters.
ex: a{3}, [xyz]{5}, .{2,6}

re*           – Match 0 or more times of ‘re’.

re+          – Match 1 or more times of ‘re’.

re?          – Optional character.

^re          – Match only a ‘re’ that at the begin of line.

re$          – Match only a ‘re’ that at the end of line.

^re$        – Matches the whole line completely at the beginning and end.

(re)          – Group of characters and capturing. Any subpattern inside a () will be captured as a group for further processing.
ex: ^(IMG\d+)\.png$: capture the filename without the extension(.png)
>> IMG2018.png
group 1: IMG2018

(r(e))       – Nested group: for extract multiple layers of information, which can result in nested groups.
ex: ^(IMG(\d+))\.png$
>> IMG39.png
group 1: IMG39
group 2: 39

re1|re2  – Match ‘re1’ or ‘re2’.

(re1|re2) – Match and group re1 or re2.

re(?=foo) – Lookahead. Match the ‘re’ that is followed by ‘foo’.
ex: \w+(?=\.)
>> hello world.
Full match: world

re(?!foo)  – Negative lookahead. Match the ‘re’ that is not followed by ‘foo’.
ex: \w+(?!\.)
>> hello.
Full match: hell

(?<=foo)re – Look behind. Match the ‘re’ that follows the ‘foo’.
ex: (?<=\?)\w+
>>?hello
Full match: hello

(?<!foo)re  – Negative look behind. Match the ‘re’ that doesn’t follow the ‘foo’.
ex: (?<!\?)\w+
>>?hello
Full match: ello

(?P<name>re) – Matches any re inside () and delimits a named group.
ex: (?P<id>\w+)
>>Python
Full match: Python
Group ‘id’: Python

(?P=name) – Matches whatever text was matched by the earlier group named name.
ex: (?P<a>\w+) (?P=a)
>>hello hello
Full match: hello hello
Group ‘a’: hello

References:

[1] Learn Regular Expressions with simple, interactive exercises.
[2] Online regex tester and debugger.

One thought on “Regular expression

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s