# 👁🗨 Regular Expressions
If you decide to use a regex to solve your problem, now have two problems!
# 🥉 Vocabulary
- Qualifiers, Meta-characters, Meta-classes
- RE works on characters
[0-9]valid[0-255]will only match[0-2]
(?waits for next character to assign a meaning
| Pattern | Description |
|---|---|
(?:... | non returning grouping, non capturing version |
(?=...) | Positive Look ahead Assertion |
(?!...) | Negative Look ahead Assertion |
(?<=...) | Positive Look behind Assertion |
(?<!...) | Negative Look behind Assertion |
(?P<name>) | named RE pattern |
(?P=name) | named RE pattern reference |
# 🥇 RE Repetition Qualifiers
. In the default mode, this matches any character except a newline.
- Non-greedy variants (usually followed by
?)
| Greedy Qualifiers | Non-greedy variants |
|---|---|
ab* | *? |
ab+ | +? |
ab? | ?? |
a{6} | a{3,5}? |
Common regex patterns
| Regex | meaning |
|---|---|
(.*) | 0..N Any charter (except space) matching 0 to n number of times |
(.+) | 1..N Match at least 1 to n number of times |
(.?) | 0..1 Match either O or 1 number of time |
# 👶 Qualifiers
The question mark character ?, matches either once or zero times;
you can think of it as marking something as being optional.
For Example, home-?brew matches either 'homebrew' or 'home-brew'
# 👣 Meta-characters and Meta-character Classes
Remember in duality
Meta-characters
.,?,*^,$[...],(...),{...}(?:...),(?=...),(?!...),(?<=...),(?<!...),(?P<name>...),(?P=name...)
Meta-character Classes
\w,\W\d,\D\a,\A\s,\S
# 4️⃣ IP pattern
Lets start with what we already know about IP generation Rules
- 4 octets
- each octet value between
0-255 - Boundary Values
0.0.0.0,255.255.255.255
"""
^ # first character match after space
(?:[0-9]{1,3}\.) # non capturing group returning numbers 0-9
# matched 1,2 or 3 times followed by a .
{3} # use previous group pattern match exactly 3 times
[0-9]{1,3}
$ # last character match followed by a space
"""
# Using named group repetition
/^(?:(?P<octet>[0-9]{1,3})\.){3}(?P=octet)$/
# 👗 Regex Assertions
Matching and returning the matches based on assertions either by looking forward in the blob or by looking backward
# ⏩ Positive Look ahead Assertion
Consider the case where use want to match only Issac Asimov and not Issac Newton
| Pattern | Match return |
|---|---|
| Issac Asimov | ✔️ Issac |
| Issac Newton | ❌ |
Issac (?=Asimov)
# ⏪ Negative Look ahead Assertion
Reverse the above situation, we want all other Issacs which are not followed by Asimov. We want Issac from Issac Newton this time.
| Pattern | Match return |
|---|---|
| Issac Asimov | ❌ |
| Issac Newton | ✔️ Issac |
Issac (?!Asimov)
# ⏭ Positive Look behind Assertion
| Blob | Patton Match return |
|---|---|
| Avi Mehenwal | ☑️ |
| Shubhranshu Mehenwal | ❌ |
Consider we want Mehenwal only from Avi Mehenwal and not from Shubhranshu Mehenwal
(?<=Avi) Mehenwal
# ⏮ Negative Look behind Assertion
| Blob | Pattern Match return |
|---|---|
| Avi Mehenwal | ❌ |
| Shubhranshu Mehenwal | ☑️ |
Now lets reverse the situation, we want all Mehenwal which are not preceded by Avi. We want Mehenwal from Shubhranshu Mehenwal
(?<!Avi) Mehenwal
# 🌹 Grep
Global Regular Expression
- egrep - Extended regular expressions
include all of the basic meta-characters along with additional meta-characters to express more complex matches.
egrep -c '^begin|end$' myfile.txt
Use python like lookahead and lookbehind regex using rg and grep on shell
echo "Nate or nate" | grep -P '(?<=N)a'
# 🏵 Resources
- https://www.regular-expressions.info/refrepeat.html
- Test and explain your RE
- One Liner Programs
- how-to-replace-perl-one-liner-regex-with-python-one-liner
- Python one Liners
- grep - global regex
- extended grep
- Debugging webapp
*[RE]: Regular Expressions | regex