Regex Operations in BigQuery

Developer building products for Fortune 500!
BigQuery is taking the analytics industry by storm. One of the most important use cases is to extract and transform parts of a string. Below are some of the powerful constructs provided by BigQuery.
BigQuery provides the following constructs for regex operations.
| Construct | Arguments | Use |
| REGEXP_CONTAINS | REGEXP_CONTAINS(value, regexp) | Checks if the string contains the regex |
| REGEXP_EXTRACT | REGEXP_EXTRACT(value, regexp[, position[, occurrence]]) | Extract the substring that matches the regex. |
| REGEXP_EXTRACT_ALL | REGEXP_EXTRACT_ALL(value, regexp) | Return all substrings that match the regex |
| REGEXP_INSTR | REGEXP_INSTR(source_value, regexp [, position[, occurrence, [occurrence_position]]]) | Return 1-based index for the first occurrence of the regex |
| REGEXP_REPLACE | REGEXP_REPLACE(value, regexp, replacement) | Replace all substrings in value |
Source: Google Cloud Platform
Table Setup
Let’s create a simple table with some string data to play around with.
-- Create Store
CREATE OR REPLACE TABLE `gcp-tour-276710.thefellowcoder.store`
(
date DATE,
details STRING,
address STRING
)
-- Add values
INSERT INTO `gcp-tour-276710.thefellowcoder.store`
VALUES
('2021-07-14', '<h1>Nike Shoes</h1><p>Base: $450 Addon: $20</p><p>Support email: help@nike.com</p>','Los Angeles, California 90001 '),
('2021-06-13', '<h1>Rebook Shoes</h1><p>Base: $150 Addon: $10</p><p>Support email: help@rebook.com</p>','Sacramento, California 94203 '),
('2021-07-11', '<h1>Adidas Shoes</h1><p>Base: $350 Addon: $30</p><p>Support email: help@adidas.com</p>','Beverly Hills, California 90209 '),
('2021-07-11', '<h1>Carnival All Rounder Shoes</h1><p>Base: $250 Addon: $0</p>','Beverly Hills, California 90209 ')
Sample Queries
Let's see each of the constructs in action
- REGEXP_CONTAINS example
-- All stores that have a support email
select * from `gcp-tour-276710.thefellowcoder.store`
where REGEXP_CONTAINS(details, r'\S+@\S+\.\S+')
- REGEXP_EXTRACT example
-- Extract Product Names
select REGEXP_EXTRACT(details, r'<h1>([^<]+)<\/h1>')
from `gcp-tour-276710.thefellowcoder.store`
- REGEXP_EXTRACT_ALL example
-- Extract all prices
select REGEXP_EXTRACT_ALL(details, r'\$(.\d+)')
from `gcp-tour-276710.thefellowcoder.store`
- REGEXP_INSTR example
-- Get index of email
select details, REGEXP_INSTR(details, r'\S+@\S+\.\S+')
from `gcp-tour-276710.thefellowcoder.store`
- REGEXP_REPLACE example
-- Replace all email-ids
select REGEXP_REPLACE(details, r'\S+@\S+\.\S+', 'admin@thefellowcoder.com') from `gcp-tour-276710.thefellowcoder.store`
Common Use-cases:
Extract part of the string with regex
This is one of the most common use cases wherein we must extract a substring that matches a regex.
-- Extract product names from all details
select REGEXP_EXTRACT(details, r'<h1>([^<]+)<\/h1>')
from `gcp-tour-276710.thefellowcoder.store`
Replace regex with string
Let’s say you want to sanitize the string and remove all special characters from the string.
-- remove special characters
select details,REGEXP_REPLACE(details, r'[^0-9a-zA-Z]+',' ')
from `gcp-tour-276710.thefellowcoder.store`
Matches any of the regex patterns
check for multiple regex patterns for a string
-- check for pincodes starting with 90/94
select address,REGEXP_EXTRACT(address, r'9[0|4]\d+') as pincode
from `gcp-tour-276710.thefellowcoder.store`
Useful regex examples for data cleaning.
| String | Regex | Output |
| <h1>Hello World!</h1> | <h1>([^<]+)<\/h1> | Hello World! |
| <p>Support email: thefellowcoder@gmail.com</p> | \w+@\w+\.\w+ | thefellowcoder@gmail.com |
| Rebook ShOes | (?i)shoes | ShOes |
| This,contains$special\characters | [^0-9a-zA-Z]+ | This contains special characters |
| String needs to be trimmed. | ^(\s+).*?(\s+)$ | String needs to be trimmed. |




