Regular Expression (Regex)
Regular Expressions (Regex) is not a programming language. Instead, it follows a syntax used in many different languages to find, manipulate, or replace patterns in texts. The word “pattern” is important in understanding Regex: Regex is a way for you to specify the patterns in data that you are looking for, so each time a certain thing occurs, such as letters existing in the same order (thus forming a specific word), it can be found. Regex can be used to clean data, validate data, specify the patterns of data that you want to scrape from a webpage, transform data, extract data, and more.
Examples
Further Resources
- Fox (2017) “Regex Tutorial—A Quick Cheatsheet by Examples”
- Regular Expression (Wikipedia)
- Turner O’Hara (2021) “Cleaning OCR’d Text with Regular Expressions”