How To Use Regular Expressions (Regex)

“Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.” – Jamie Zawinski

Every now and then I have some requirements where it entails parsing some data where I need to use some form of Regular Expression(regex), it is a syntax to use to search for patterns in a string or sets of strings. At first, the syntax looks intimidating and most people would shy away and resort to writing some functions to solve the issue.

Learning regex is a must-have skill to have as it can be applied to a wide range of tasks that needs some sort of search or parsing and is widely available in most programming languages, works on shells, text editors and IDEs, etc..). I use it most of the time when writing front end Javascript validation and backend logic in Apex NodeJS, Java/Groovy , Swift and Python

I created this cheat sheet that covers the basics and some handy tips.

Flags

The search pattern is normally delimited by two slash characters /abc/. At the end we can specify a combination of the following flags.

  • g (global) – Don’t return after the first match
  • m (multi-line) – ^ and $ match start/end of line
  • i (insensitive) case insensitive match
  • x (extended) ignore whitespace
  • X (eXtra) disallow meaningless escape
  • s (single line) dot matches new line
  • u (unicode) match with full unicode
  • U (Ungreedy) make quantifiers lazy
  • A (Anchored) anchor to start of pattern
  • J (Jchanged) allow duplicate subpattern names
  • D (Dollar end only) $ matches only end pattern


Anchors

Quantifiers

Or and Brackets

When inside bracket expressions all special character rules do not apply. E.g. \ to escape a character does not apply

Character Classes

In order to be taken literally, you must escape the characters ^.[$()|*+?{\with a backslash \ as they have special meaning.

Groupings

Greedy and Lazy Quantifiers

The quantifiers ( * + {}) are greedy operators, so they expand the match as far as they can through the provided text.

Boundaries

Back References – \1

Look-ahead and Look-behind

Top Regular Expressions

Summary

As we’ve learned regex is so powerful and its’s application is wide. Listed below some of few things you can do with regex within your project.

  • input and data validation –
    • validate user input in forms
    • validate data before applying logic or saving to database
    • validating JSON schema
  • replacing values – replace specific data in a string
  • text parsing – eg. retrieve only bits of data from a string or URL or delimiters
  • string replacement – eg on some IDE you can find and replace a string, use regex to search for particular patterns
  • web scraping – look for specific patterns for data to scrape

Sample Codes

Apex class that implements removal of white space not found in quotes

Python script that crawls pages that matches the pattern.

And finally, as a takeaway just learn the syntax and hack away.

Leave a Reply

Your email address will not be published. Required fields are marked *