# 👁🗨 Regular Expressions
If you decide to use a regex to solve your problem, now have two problems!
# 🥉 Vocabulary
- Qualifiers, Meta-characters, Meta-classes
- RE works on characters
[0-9]
valid[0-255]
will only match[0-2]
(?
waits for next character to assign a meaning
Pattern | Description |
---|---|
(?:... | non returning grouping, non capturing version |
(?=...) | Positive Look ahead Assertion |
(?!...) | Negative Look ahead Assertion |
(?<=...) | Positive Look behind Assertion |
(?<!...) | Negative Look behind Assertion |
(?P<name>) | named RE pattern |
(?P=name) | named RE pattern reference |
# 🥇 RE Repetition Qualifiers
.
In the default mode, this matches any character except a newline.
- Non-greedy variants (usually followed by
?
)
Greedy Qualifiers | Non-greedy variants |
---|---|
ab* | *? |
ab+ | +? |
ab? | ?? |
a{6} | a{3,5}? |
Common regex patterns
Regex | meaning |
---|---|
(.*) | 0..N Any charter (except space) matching 0 to n number of times |
(.+) | 1..N Match at least 1 to n number of times |
(.?) | 0..1 Match either O or 1 number of time |
# 👶 Qualifiers
The question mark character ?
, matches either once or zero times;
you can think of it as marking something as being optional.
For Example, home-?brew
matches either 'homebrew'
or 'home-brew'
# 👣 Meta-characters and Meta-character Classes
Remember in duality
Meta-characters
.
,?
,*
^
,$
[...]
,(...)
,{...}
(?:...)
,(?=...)
,(?!...)
,(?<=...)
,(?<!...)
,(?P<name>...)
,(?P=name...)
Meta-character Classes
\w
,\W
\d
,\D
\a
,\A
\s
,\S
# 4️⃣ IP pattern
Lets start with what we already know about IP generation Rules
- 4 octets
- each octet value between
0-255
- Boundary Values
0.0.0.0
,255.255.255.255
"""
^ # first character match after space
(?:[0-9]{1,3}\.) # non capturing group returning numbers 0-9
# matched 1,2 or 3 times followed by a .
{3} # use previous group pattern match exactly 3 times
[0-9]{1,3}
$ # last character match followed by a space
"""
# Using named group repetition
/^(?:(?P<octet>[0-9]{1,3})\.){3}(?P=octet)$/
# 👗 Regex Assertions
Matching and returning the matches based on assertions either by looking forward in the blob or by looking backward
# ⏩ Positive Look ahead Assertion
Consider the case where use want to match only Issac Asimov and not Issac Newton
Pattern | Match return |
---|---|
Issac Asimov | ✔️ Issac |
Issac Newton | ❌ |
Issac (?=Asimov)
# ⏪ Negative Look ahead Assertion
Reverse the above situation, we want all other Issacs which are not followed by Asimov. We want Issac from Issac Newton this time.
Pattern | Match return |
---|---|
Issac Asimov | ❌ |
Issac Newton | ✔️ Issac |
Issac (?!Asimov)
# ⏭ Positive Look behind Assertion
Blob | Patton Match return |
---|---|
Avi Mehenwal | ☑️ |
Shubhranshu Mehenwal | ❌ |
Consider we want Mehenwal only from Avi Mehenwal and not from Shubhranshu Mehenwal
(?<=Avi) Mehenwal
# ⏮ Negative Look behind Assertion
Blob | Pattern Match return |
---|---|
Avi Mehenwal | ❌ |
Shubhranshu Mehenwal | ☑️ |
Now lets reverse the situation, we want all Mehenwal which are not preceded by Avi. We want Mehenwal from Shubhranshu Mehenwal
(?<!Avi) Mehenwal
# 🌹 Grep
Global Regular Expression
- egrep - Extended regular expressions
include all of the basic meta-characters along with additional meta-characters to express more complex matches.
egrep -c '^begin|end$' myfile.txt
Use python like lookahead and lookbehind regex using rg and grep on shell
echo "Nate or nate" | grep -P '(?<=N)a'
# 🏵 Resources
- https://www.regular-expressions.info/refrepeat.html
- Test and explain your RE
- One Liner Programs
- how-to-replace-perl-one-liner-regex-with-python-one-liner
- Python one Liners
- grep - global regex
- extended grep
- Debugging webapp
*[RE]: Regular Expressions | regex