In this tutorial, I have explained the building blocks of Regular Expressions (or REGEX), so that you can understand them and use them in Google Analytics (including GA4) and Google Tag Manager.
What is a Regular Expression?
Regular expressions, also called regex, are used to check for a pattern in a string.
For example ^Colou?r$ is a regular expression that matches both the string: ‘color’ and ‘colour’.
A regex is made up of characters and metacharacters.
Note: The regex used in this article is for JavaScript.
What are Google Analytics Regex (regular Expressions)
Google Analytics uses JavaScript regex, which is used to perform advanced matching and substitution operations that would be difficult or impossible to achieve using other methods.
You can create more sophisticated and accurate reports in Google Analytics by using regex. You can carry out advanced data analysis.
Overall, using regex in GA can help you get more value out of your data.
What are different types of Regular Expressions (aka Regex Engines)
Regular expressions are categorized based on the type of syntax and computer language being used for their creation.
Implementation of regex functionality using a particular type of syntax and computer language is called a regex engine.
There are many types of regex engines available. The most popular among them are:
JavaScript
PHP
Python
Ruby
Java
C++
Golang
.NET
Different regex engines support different types of syntax, and the meaning of metacharacters (characters with special meanings in a regex) may change depending on the regex engine used.
Thus, a regular expression considered valid under one regex engine may not be considered valid under another.
Whenever you test a regex using a regex tester tool (like regex101.com), you get the option to select the flavor (aka regex engine) under which you want to test your regular expression:
Since the regex engine used by Google Analytics and Google Tag Manager is JavaScript, you should always select ‘JavaScript’ as the flavor before testing your regular expressions for GA/GTM.
What is ‘partially matches’ regex?
Partially matches regex means the regex partially matches a pattern in a string.
Let us suppose you provided the regex ‘car’.
This regex partially matches the following patterns in a string: ‘carbohydrates’, ‘carbon’, ‘caramel’, ‘caravan’, ‘cardiac’ etc.
You don’t need to use metacharacters for partially matches regex.
If you want to create a filtered reporting view in GA4 and you have access to GA4 360 (paid version of GA4), then create a subproperty.
A subproperty is like a typical GA4 property, but it gets its data from another property (also called the source property).
To create a subproperty, you will need to use event filter(s). To create an event filter, you will need to specify one or more conditions. You can use regex while defining the conditions:
#2 Setting up site search tracking without query parameters
However, there could be a situation in which the site search feature is installed on your website in such a way that the default site search tracking feature provided by GA4 won’t work for you.
Custom events are the events that you create and use.
Custom events can be any interaction on your website that is not tracked by default.
For example, button clicks, sign-up events, form submissions, etc.
When you set up a custom event via GTM, you create a trigger that fires when the event occurs, and all trigger conditions are true.
You can use regex while creating the trigger conditions:
#6 Setting up Content groups in GA4
In the context of GA4, a content group is a set of web pages that are based on the same theme.
So in the case of a blog, a content group can be a set of web pages based on the same topic, e.g. Attribution Modelling.
In the case of an ecommerce website, a content group can be a set of web pages that sell similar products, e.g. shoes.
Content groups allow you to measure the performance of a set of web pages at the content category or product category level.
Content groups are especially useful if you have a big website with hundreds or thousands of web pages. You can realistically measure the web pages’ performance only at the group level and not at the individual page level.
While setting up content groups in GA4, you will need to identify all the web pages which will be part of the content group. You can identify such web pages via regular expressions.
For example:
In order to identify all the web pages on my website that belong to the ‘Attribution Modelling’ content group, I can use the following regular expression:
In the context of GA4, an audience is a group of users that you can club together based on any combinations of attributes or experiences in a particular time frame.
The audiences feature in GA4 allows you to segment your users based on the dimensions, metrics, and events important to your business.
While creating/editing an audience, you need to set up one or more conditions that define the audience criteria. You can use regex while setting up these conditions:
What are the advantages of using REGEX in GA3 (Universal Analytics)?
There are many cases where regular expressions are very useful in GA3. Some such cases are:
Setting up a goal which can match multiple-goal pages instead of one.
Setting up a funnel in which a funnel step can match multiple pages instead of one.
Excluding traffic from an IP address range via filters.
Setting up complex custom segments.
Understanding the commercial value of long-tail keywords.
Rewriting URLs in GA3 reports.
Filtering data based on complex patterns within the GA3 reporting interface.
Finding referrer spam in Google Analytics.
Blocking spam referrers through the custom advanced filter in Google Analytics.
Using regex while creating content groups in Google Analytics.
Using regex while creating Channel grouping in Google Analytics.
Using regex in the table filter.
Using regex in dashboard widgets.
Using regex while building audiences.
Using regex while tracking site search without query parameter.
Using regex while debugging Google Analytics tracking issues.
#1 Setting up a goal which can match multiple-goal pages instead of one.
You can create one goal that matches multiple pages.
Suppose that after doing a transaction, or generating a lead, your user redirects to a thank you page and that every user has a unique thank you page URL like /product/thank-you/ and product2/thank-you/.
In this case, we can create one goal in Google Analytics for every thank-you page as below
Destination URL matches regex thank\-you\/$
#2 Setting up a funnel in which a funnel step can match multiple pages instead of one.
For example, in a standard sales funnel, a user lands on a home page which is the first step of the funnel.
After landing on the home page, the user may go to various categories in search of specific products.
In this case, different category pages will have different URLs, and if you want to add all your category URLs into the sales funnels, the regex should be your first option to go with.
In fact, when you set up a funnel, all URLs are treated as regular expressions:
Metacharacters are the building blocks of a regex.
These are the characters that have special meanings in a regex.
Following are the examples of metacharacters for the JavaScript regex:
Other Meta Characters for JavaScript Regex:
Metacharacter – Forward Slash
Forward Slash (/) has a special meaning in a regex.
It is used to mark the beginning and end of a regular expression.
For example:
/shop/
The regular expression /shop/ matches the pattern ‘shop’ in a string.
So this regular expression will match the following patterns in a string:
“I’m going to the shop to buy some milk.”
“The shop is open from 9am to 5pm.”
“I need to stop by the shop to pick up some bread.”
These strings all contain the exact substring ‘shop’, so the regular expression would match them.
Note that this regular expression will only match the exact string ‘shop’.
It will not match ‘shopping’, ‘shopper’, or any other string that contains ‘shop’ as a substring.
If you want to match any string that contains ‘shop’ as a substring, you can use the . character, which matches any single character (except for the newline).
/shop./
/^[a-z]+$/
In this example, the regular expression /^[a-z]+$/ is used to match a string that consists only of lowercase letters.
The ^ character indicates the beginning of the string, and the $ character indicates the end of the string.
The [a-z] character class matches any lowercase letter, and the + character indicates that one or more of the preceding characters should be matched.
Here are some examples of strings that would match this regular expression:
“abc”
“def”
“ghijklmnopqrstuvwxyz”
These strings all consist only of lowercase letters, so they would be matched by the regular expression.
The regex /^[a-z]+$/ will not match strings “abc123” or “ABC” because they both contain characters other than lowercase letters.
/colou?r/
The regular expression /colou?r/ is a pattern that matches the string ‘colour’ or ‘color’.
It uses the metacharacter ?, which indicates that the preceding character or character class should be matched 0 or 1 time.
In this case, the ? character is placed after the ‘u’ in ‘colou’, indicating that the ‘u’ is optional.
This means that the regular expression will match both the string ‘colour’ and the string ‘color’.
Metacharacter – Back Slash
‘\’ is the escaping character (also known as back slash) that is used to escape from the normal way a subsequent character is interpreted.
Through escaping character, you can convert a regular character into a metacharacter or turn a metacharacter into a regular character.
‘n’ is a regular character.
But if you add escaping character (back slash) before it, then it would become a metacharacter: \n, which is a new line character.
If you use the regex /abcd\n/, it won’t match the string abcd\n3456 because \n would be treated as a newline character instead of a regular character.
Using the regex /abcd\\n/ will match the string abcd\n3456 because \n would be treated as a regular character instead of the newline character.
‘s’ is a regular character.
But if you add escaping character (back slash) before it, then it would become a metacharacter: \s, which is used to check for whitespace characters.
The regular expression /\s/ will match any white space character in the string “Hello world!“.
Using the regex /abcd\s/ won’t match the string abcd\s3456 because \s would be treated as a metacharacter instead of a regular character.
Using the regex /abcd\\s/ will match the string abcd\s3456 because \s would be treated as a regular character instead of the metacharacter.
How to make forward slash a regular character?
If you want regex to treat forward slash as a forward slash and not some special character, then you need to use it along with the escaping character (back slash) like this: \/
So if you want to check for a pattern, say /shop in the string /shop/collection/men/
then your regex should be: /\/shop/
Using the regex //shop/ won’t match any pattern in the string /shop/collection/men/ because /s would be treated as a metacharacter instead of a regular forward slash.
How to make ‘?’ a regular character?
‘?‘ is a metacharacter.
To make it a regular character, you need to add escaping character before \?
So if you want to check for a question mark in the string colou?r
then your regex should be: /colou\?r/
If you use the regex /colou?r/, it would match the string color or colour and not colou?r as then ? will be treated as a metacharacter.
Metacharacter – Caret ^
‘^’ – This is known as ‘Caret’ and is used to denote the beginning of a regular expression.
/^\/Colou?r/ => Check for a pattern which starts with ‘/Color’ or ‘/Colour’.
The regular expression /^\/Colou?r/ consists of three parts:
The start-of-line anchor ^ indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The forward slash “/” is a literal character that the regular expression will try to match.
The string “Colou” is a literal string that the regular expression will try to match.
The question mark (?) and the letter “r“:
The question mark indicates that the preceding character (in this case, the letter “u”) is optional. It will match zero or one occurrence of the preceding character.
The letter “r” is a literal character that the regular expression will try to match.
Together, this regular expression will match strings that start with a forward slash “/”, followed by the characters “co”, followed by zero or one occurrence of the character “u”, and then the character “r”.
For example,
This regular expression would match the following strings:
“/Colour”: This string starts with “/Colour”.
“/Color”: This string starts with “/Color”.
/Colour/?proid=3456/review
/Color-red/?proid=3456/review
This regular expression would not match the following string:
“/coloura”: This string does not start with a forward slash.
/^[nN]ov(ember)? 28(th)?$/
The regular expression /^[nN]ov(ember)? 28(th)?$/ consists of several parts:
The start-of-line anchor ^: This indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The character set [nN]: This matches either the lowercase letter "n" or the uppercase letter "N".
The string "ov": This is a literal string that the regular expression will try to match.
The group (ember)?: This group consists of the string "ember" and the question mark (?). The question mark indicates that the preceding string is optional. It will match zero or one occurrence of the preceding string.
The string " 28": This is a literal string that the regular expression will try to match.
The group (th)?: This group consists of the string "th" and the question mark (?). The question mark indicates that the preceding string is optional. It will match zero or one occurrence of the preceding string.
The end-of-line anchor $: This indicates that the regular expression should only match if the pattern appears at the end of a string.
Together, this regular expression will match strings that start and end with either "nov" or "Nov", optionally followed by "ember", followed by " 28", optionally followed by "th".
For example, this regular expression would match the following strings:
"Nov 28": This string starts and ends with "Nov 28".
"Nov 28th": This string starts and ends with "Nov 28th".
"nov ember 28": This string starts and ends with "nov ember 28".
"Nov ember 28th": This string starts and ends with "Nov ember 28th".
It would not match the following strings:
"Nov": This string does not end with "28".
"Nov 28th 29th": This string does not end with "28th".
/^\/elearning\.html/ => Check for a pattern which starts with ‘/elearning.html’.
The regular expression /^\/elearning\.html/ consists of three parts:
The start-of-line anchor ^: This indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The forward slash "/": This is a literal character that the regular expression will try to match.
The string "elearning.html": This is a literal string that the regular expression will try to match.
Together, this regular expression will match strings that start with a forward slash "/", followed by the characters "elearning.html".
For example, this regular expression would match the following string:
"/elearning.html": This string starts with "/elearning.html".
It would not match the following strings:
"/elearning": This string does not end with ".html".
"elearning.html": This string does not start with a forward slash.
/^\/.*\.php/ => Check for a pattern which starts with any file with .php extension.
The regular expression /^\/.*\.php/ consists of three parts:
The start-of-line anchor ^: This indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The forward slash "/": This is a literal character that the regular expression will try to match.
The group .*\.php: This group consists of the following two parts:
The dot (.) and the asterisk (*): The dot matches any single character, and the asterisk indicates that the preceding character (in this case, the dot) can be matched zero or more times. This group will therefore match any string of characters.
The string ".php": This is a literal string that the regular expression will try to match.
Together, this regular expression will match strings that start with a forward slash "/", followed by any string of characters, followed by the characters ".php".
For example, this regular expression would match the following strings:
"/abc.php": This string starts with "/abc.php".
"/path/to/file.php": This string starts with "/path/to/file.php".
/^\/product-price\.php/ => Check for a pattern which starts with ‘/product-price.php’.
The regular expression /^\/product-price\.php/ consists of three parts:
The start-of-line anchor ^: This indicates that the regular expression should only match if the pattern appears at the beginning of a string.
The forward slash "/": This is a literal character that the regular expression will try to match.
The string "product-price.php": This is a literal string that the regular expression will try to match.
Together, this regular expression will match strings that start with a forward slash "/", followed by the characters "product-price.php".
For example, this regular expression would match the following string:
"/product-price.php": This string starts with "/product-price.php".
Caret also means NOT when used after the opening square bracket.
/[^a]/ => Check for any single character other than the lowercase letter ‘a’.
The regular expression /[^a]/ consists of two parts:
The character set [^a]: This matches any single character that is NOT the letter "a". The caret (^) inside the square brackets indicates that the character set should match any character that is NOT in the set.
The forward slash "/": This indicates the end of the regular expression.
Together, this regular expression will match any single character that is NOT the letter "a".
For example, this regular expression would match the following string:
“bcd”
“defg”
“hijkl
/[^B]/ = > Check for any single character other than the uppercase letter ‘B’.
For example: the regex /product-[^B]/ will match the following strings:
/shop/men/sales/product-b
/shop/men/sales/product-c
/[^1]/ => Check for any single character other than the number ‘1’.
For example: the regex /proid=[^1]/ will match the string:
/men/product-b?proid=3456&gclid=153dwf3533
but will not match the string:
/men/product-b?proid=1456&gclid=153dwf3533
/[^ab]/ => Check for any single character other than the lowercase letters ‘a’ and ‘b’.
For example: the regex /location=[^ab]/ will match the string:
/shop/collection/prodID=141?location=canada
but will not match the string:
/shop/collection/prodID=141?location=america
/shop/collection/prodID=141?location=bermuda
/[^aB]/ => Check for any single character other than the lower case letter ‘a’ and uppercase letter ‘B’.
Here are a few examples of strings that will all be matched by this regular expression:
"c": This string consists of a single character that is not "a" or "B".
"xyz": This string consists of three characters that are not "a" or "B".
"123": This string consists of three characters that are not "a" or "B".
"#$%&": This string consists of four characters that are not "a" or "B".
/[^1B]/ => Check for any single character other than the number ‘1’ and uppercase letter ‘B’
Here are a few examples of strings that will all be matched by this regular expression:
"a": This string consists of a single character that is not "1" or "B".
"xyz": This string consists of three characters that are not "1" or "B".
"123": This string consists of three characters that are not "1" or "B".
"#$%&": This string consists of four characters that are not "1" or "B".
/[^Dog]/ => Check for any single character other than the following: uppercase letter ‘D’, lowercase letter ‘o’ and the lowercase letter ‘g’.
For example: the regex /location=[^Dog]/ will match:
/shop/collection/prodID=141?location=canada
/shop/collection/prodID=141?location=denmark
but will not match:
/shop/collection/prodID=141?location=Denver
/shop/collection/prodID=141?location=ontario
/shop/collection/prodID=141?location=greenland
/[^123b]/ => Check for any single character other than the following characters: number ‘1’, number ‘2’, number ‘3’ and lowercase letter ‘b’.
Here are a few examples of strings that will all be matched by this regular expression:
"a": This string consists of a single character that is not "1", "2", "3", or "b".
"xyz": This string consists of three characters that are not "1", "2", "3", or "b".
"#$%&": This string consists of four characters that are not "1", "2", "3", or "b".
/[^1-3]/ => Check for any single character other than the following: number ‘1’, number ‘2’ and number ‘3’.
For example: the regex /prodID=[^1-3]/ will match:
/shop/collection/prodID=45321&cid=1313
/shop/collection/prodID=5321&cid=13442
but will not match:
/shop/collection/prodID=12321&cid=1313
/shop/collection/prodID=2321&cid=1313
/shop/collection/prodID=321&cid=1313
/[^0-9]/ => Check for any single character other than the number.
For example: the regex /de\/[^0-9]/ will match all those pages in the de/ folder whose name doesn’t start with a number:
/de/school-london
/de/general/
but will not match:
/de/12fggtyooo
/[^a-z]/ => Check for any single character which is not a lowercase letter.
For example: the regex /de\/[^a-z]/ will match all those pages in the de/ folder whose name doesn’t start with a lowercase letter:
/de/1london-school /de/?productid=423543
but will not match:
/de/school/london
/[^A-Z]/ => Check for any single character which is not an upper case letter.
Here are a few examples of strings that will all be matched by the regular expression /[^A-Z]/:
"a": This string consists of a single character that is not an uppercase letter.
"xyz": This string consists of three characters that are not uppercase letters.
"123": This string consists of three characters that are not uppercase letters.
Metacharacter – Dollar $
‘$’ – It is used to denote the end of a regular expression or end of a line.
Examples
/Colou?r$/ => Check for a pattern which ends with ‘Color’ or ‘Colour’
/Nov(ember)?$/ => Check for a pattern which ends with ‘Nov’ or ‘November’
/elearning\.html$/ => Check for a pattern which ends with ‘elearning.html’
/\.php$/ => Check for a pattern which ends with .php
/product-price\.php$/ => Check for a pattern which ends with ‘product-price.php’
Metacharacter – Square Bracket []
‘[]’ – This square bracket is used to check for any single character in the character set specified in [].
Examples
/[a]/ => Check for a single character which is a lowercase letter ‘a’.
/[ab]/ => Check for a single character which is either a lower case letter ‘a’ or ‘b’.
/[aB]/ => Check for a single character which is either a lower case letter ‘a’ or uppercase letter ‘B’.
/[1B]/ => Check for a single character which is either a number ‘1’ or an uppercase letter ‘B’.
/[Dog]/ => Check for a single character which can be any one of the following: uppercase letter ‘D’, lower case letter ‘o’ or the lowercase letter ‘g’.
/[123b]/ => Check for a single character which can be any one of the following: number ‘1’, number ‘2’, number ‘3’ or lowercase letter ‘b’.
/[1-3]/ => Check for a single character which can be any one number from 1, 2 and 3.
/[0-9]/ => Check for a single character which is a number.
/[a-d]/ => Check for a single character which can be any one of the following lowercase letter: ‘a’, ‘b’, ‘c’ or ‘d’.
/[a-z]/ => Check for a single character which is a lowercase letter.
/[A-Z]/ => Check for a single character which is an upper case letter.
/[A-T]/ => Check for a single character which can be any uppercase letter from ‘A’ to ‘T’.
/[home.php]/ => Check for a single character which can be any one of the following characters:
lowercase letter ‘h’,
lowercase letter ‘o’,
lowercase letter ‘m’,
lowercase letter ‘e’,
special character ‘.’,
lower case letter ‘p’,
lowercase letter ‘h’ or
lowercase letter ‘p’
Note: If you want to check for a letter regardless of its case (upper case or lowercase) then use the regex /[a-zA-Z]/.
Metacharacter – Parenthesis ()
‘()’ – This is known as parenthesis and is used to check for a string.
Examples
/(a)/ => Check for string ‘a’
/(ab)/ => Check for string ‘ab’
/(dog)/ => Check for string ‘dog’
/(dog123)/ => Check for string ‘dog123’
/(0-9)/ => Check for string ‘0-9’
/(A-Z)/ => Check for string ‘A-Z’
/(a-z)/ => Check for string ‘a-z’
/(123dog588)/ => Check for string ‘123dog588’
Note: () is also used to create and store variables. For e.g. /^ (.*) $/
Metacharacter – Question Mark ?
‘?’ is used to check for zero or one occurrence of the preceding character.
For example:
/[a]?/ => Check for zero or one occurrence of the lowercase letter ‘a’.
The regular expression /[a]?/ consists of two parts:
The character set [a]: This matches a single character, the letter "a".
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.
Together, this regular expression will match zero or one occurrence of the letter "a".
For example, the following strings will all be matched by this regular expression:
“” (empty string, no occurrences of “a”)
“a” (one occurrence of “a”)
“abc” (one occurrence of “a”)
/[dog]?/ => Check for zero or one occurrence of the lowercase letter ‘d’, ‘o’ or ‘g’.
The regular expression /[dog]?/ consists of two parts:
The character set [dog]: This matches a single character that is either "d", "o", or "g".
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.
Together, this regular expression will match zero or one occurrence of the characters "d", "o", or "g".
For example, the following strings will all be matched by this regular expression:
“” (empty string, no occurrences of “d”, “o”, or “g”)
“d” (one occurrence of “d”)
“o” (one occurrence of “o”)
“g” (one occurrence of “g”)
/[^dog]?/ => Check for zero or one occurrence of a character which is not the lowercase letter ‘d’, ‘o’ or ‘g’.
The regular expression /[^dog]?/ consists of two parts:
The character set [^dog]: This matches any character that is NOT "d", "o", or "g". The caret (^) inside the square brackets indicates that the character set should match any character that is NOT in the set.
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.
Together, this regular expression will match zero or one occurrence of any character that is NOT "d", "o", or "g".
For example, the following strings will all be matched by this regular expression:
“” (empty string, no occurrences of characters that are not “d”, “o”, or “g”)
“a” (one occurrence of a letter that is not “d”, “o”, or “g”)
“!” (one occurrence of a punctuation mark that is not “d”, “o”, or “g”)
“a!” (one occurrence each of a letter and a punctuation mark that are not “d”, “o”, or “g”)
“a!0” (one occurrence each of a letter, a punctuation mark, and a digit that are not “d”, “o”, or “g”)
/[0-9]?/ => Check for zero or one occurrence of a number.
The regular expression /[0-9]?/ consists of two parts:
The character set [0-9]: This matches any single digit between 0 and 9, inclusive. The range 0-9 inside the square brackets indicates that the character set should match any character that is within that range.
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.
Together, this regular expression will match zero or one occurrence of any single digit between 0 and 9, inclusive.
For example, the following strings will all be matched by this regular expression:
“” (empty string, no occurrences of digits)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
/[^a-z]?/ => Check for zero or one occurrence of a character which is not a lowercase letter.
The regular expression /[^a-z]?/ consists of two parts:
The character set [^a-z]: This matches any character that is not a lowercase letter in the alphabet. The caret (^) inside the square brackets indicates that the character set should match any character that is NOT in the set.
The question mark (?): This indicates that the preceding character set or pattern is optional. It will match zero or one occurrence of the preceding character set or pattern.
Together, this regular expression will match zero or one occurrence of any character that is NOT a lowercase letter in the alphabet.
For example, the following strings will all be matched by this regular expression:
“abc123” – This string would match the regular expression because it contains a number which is not a lowercase letter. The match would be the 1 character.
“Abc” – This string would match the regular expression because it contains an uppercase letter which is not a lowercase letter. The match would be the A character.
This regular expression will NOT match the following string:
"abc"- This string would not match the regular expression because it does not contain any characters that are not lowercase letters.
Metacharacter – Plus +
‘+‘ is used to check for one or more occurrences of the preceding character.
For example:
/[a]+/ => Check for one or more occurrences of the lowercase letter ‘a’.
For example, the following strings will all be matched by this regular expression:
“a” (one occurrence of “a”)
“aa” (two occurrences of “a”)
“aaa” (three occurrences of “a”)
/[dog]+/ => Check for one or more occurrences of letters ‘d’, ‘o’ or ‘g’ (in any order).
For example, the following strings will all be matched by this regular expression:
“d” (one occurrence of “d”)
“o” (one occurrence of “o”)
“g” (one occurrence of “g”)
“dog” (one occurrence each of “d”, “o”, and “g”)
“god” (one occurrence each of “g”, “o”, and “d”)
“godog” (two occurrences each of “g”, “o”, and “d”)
/[548]+/ => Check for one or more occurrences of numbers ‘5’, ‘4’ or ‘8’ (in any order).
For example, the following strings will all be matched by this regular expression:
“5” (one occurrence of “5”)
“4” (one occurrence of “4”)
“8” (one occurrence of “8”)
“548” (one occurrence each of “5”, “4”, and “8”)
“854” (one occurrence each of “8”, “5”, and “4”)
“854458” (two occurrences each of “8”, “5”, and “4”)
/[0-9]+/ => Check for one or more occurrences of a number.
For example, the following strings will all be matched by this regular expression:
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“01” (one occurrence each of “0” and “1”)
“09” (one occurrence each of “0” and “9”)
“0123456789” (one occurrence each of “0” through “9”)
/[a-z]+/ => Check for one or more occurrences of a lowercase letter.
For example, the following strings will all be matched by this regular expression:
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“ab” (one occurrence each of “a” and “b”)
“az” (one occurrence each of “a” and “z”)
“abcdefghijklmnopqrstuvwxyz” (one occurrence each of “a” through “z”)
/[^a-z]+/ => Check for one or more characters which are not lowercase letters.
For example, the following strings will all be matched by this regular expression:
“0” (one occurrence of a digit that is not a lowercase letter)
“!” (one occurrence of a punctuation mark that is not a lowercase letter)
“0!” (one occurrence each of a digit and a punctuation mark that are not lowercase letters)
“0!A” (one occurrence each of a digit, a punctuation mark, and an uppercase letter that are not lowercase letters)
/[a-zA-z]+/ => Check for one or more occurrences of uppercase and lowercase letters.
For example, the following strings will all be matched by this regular expression:
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“A” (one occurrence of “A”)
“Z” (one occurrence of “Z”)
“ab” (one occurrence each of “a” and “b”)
“az” (one occurrence each of “a” and “z”)
“AZ” (one occurrence each of “A” and “Z”)
“aA” (one occurrence each of “a” and “A”)
“abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ” (one occurrence each of “a” through “z” and “A” through “Z”)
/[a-z0-9]+/ => Check for one or more occurrences of lowercase letters and numbers.
For example, the following strings will all be matched by this regular expression:
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“ab” (one occurrence each of “a” and “b”)
“az” (one occurrence each of “a” and “z”)
“09” (one occurrence each of “0” and “9”)
“a0” (one occurrence each of “a” and “0”)
“abcdefghijklmnopqrstuvwxyz0123456789” (one occurrence each of “a” through “z” and “0” through “9”)
/[A-Z0-9]+/ => Check for one or more occurrences of uppercase letters and numbers.
For example, the following strings will all be matched by this regular expression:
“A” (one occurrence of “A”)
“Z” (one occurrence of “Z”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“AB” (one occurrence each of “A” and “B”)
“AZ” (one occurrence each of “A” and “Z”)
“09” (one occurrence each of “0” and “9”)
“A0” (one occurrence each of “A” and “0”)
“ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789” (one occurrence each of “A” through “Z” and “0” through “9”)
/[^9]+/ => Check for one or more occurrences of characters but not the number 9.
For example, the following strings will all be matched by this regular expression:
“a” (one occurrence of a letter that is not “9”)
“!” (one occurrence of a punctuation mark that is not “9”)
“a!” (one occurrence each of a letter and a punctuation mark that are not “9”)
“a!0” (one occurrence each of a letter, a punctuation mark, and a digit that are not “9”)
/31+/ => Check for one or more occurrences of the numbers 3 and 1 in sequence.
For example, the following strings will all be matched by this regular expression:
“31” (one occurrence of “31”)
“311” (two occurrences of “31”)
“3111” (three occurrences of “31”)
However, the following strings will not be matched:
“” (an empty string, zero occurrences of “31”)
“3” (one occurrence of “3”, but not followed by “1”)
“1” (one occurrence of “1”, but not preceded by “3”)
Metacharacter – Multiply *
‘*‘ is used to check for any number of occurrences (including zero occurrences) of the preceding character.
For example:
/[a]*/ => Check for zero or more occurrences of the lowercase letter ‘a’.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “a”)
“a” (one occurrence of “a”)
“aa” (two occurrences of “a”)
“aaa” (three occurrences of “a”)
/[dog]*/ => Check for zero or more occurrences of letters ‘d’, ‘o’ or ‘g’ (in any order).
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “d”, “o”, or “g”)
“d” (one occurrence of “d”)
“g” (one occurrence of “g”)
“o” (one occurrence of “o”)
“dog” (one occurrence each of “d”, “o”, and “g”)
“god” (one occurrence each of “g”, “o”, and “d”)
“ogd” (one occurrence each of “o”, “g”, and “d”)
/[548]*/ => Check for zero or more occurrences of numbers ‘5’, ‘4’ or ‘8’ (in any order).
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “5”, “4”, or “8”)
“5” (one occurrence of “5”)
“4” (one occurrence of “4”)
“8” (one occurrence of “8”)
“54” (one occurrence each of “5” and “4”)
“85” (one occurrence each of “8” and “5”)
“548” (one occurrence each of “5”, “4”, and “8”)
/[0-9]*/ => Check for zero or more occurrences of a number.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “0” through “9”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“89” (one occurrence each of “8” and “9”)
“1234” (one occurrence each of “1”, “2”, “3”, and “4”)
/[a-z]*/ => Check for zero or more occurrences of a lowercase letter.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “a” through “z”)
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“az” (one occurrence each of “a” and “z”)
“abc” (one occurrence each of “a”, “b”, and “c”)
/[^a-z]*/ => Check for zero or more characters which are not lowercase letters.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of any characters other than “a” through “z”)
“0” (one occurrence of a digit that is not a lowercase letter)
“Z” (one occurrence of an uppercase letter that is not a lowercase letter)
“!” (one occurrence of a punctuation mark that is not a lowercase letter)
“0Z!” (one occurrence each of a digit, an uppercase letter, and a punctuation mark that are not lowercase letters)
/[a-zA-z]*/ => Check for zero or more occurrences of uppercase and lowercase letters.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “a” through “z”)
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“A” (one occurrence of “A”)
“Z” (one occurrence of “Z”)
“az” (one occurrence each of “a” and “z”)
“azAZ” (one occurrence each of “a”, “z”, “A”, and “Z”)
/[a-z0-9]*/ => Check for zero or more occurrences of lowercase letters and numbers.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “a” through “z” or “0” through “9”)
“a” (one occurrence of “a”)
“z” (one occurrence of “z”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“az” (one occurrence each of “a” and “z”)
“09” (one occurrence each of “0” and “9”)
“abc123” (one occurrence each of “a”, “b”, “c”, “1”, “2”, and “3”)
/[A-Z0-9]*/ => Check for zero or more occurrences of uppercase letters and numbers.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “A” through “Z” or “0” through “9”)
“A” (one occurrence of “A”)
“Z” (one occurrence of “Z”)
“0” (one occurrence of “0”)
“9” (one occurrence of “9”)
“AZ” (one occurrence each of “A” and “Z”)
“09” (one occurrence each of “0” and “9”)
“ABC123” (one occurrence each of “A”, “B”, “C”, “1”, “2”, and “3”)
/[^9]*/ => Check for zero or more occurrences of characters but not the number 9.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of any characters other than “9”)
“a” (one occurrence of a letter that is not “9”)
“0” (one occurrence of a digit that is not “9”)
“!” (one occurrence of a punctuation mark that is not “9”)
“a0!” (one occurrence each of a letter, a digit, and a punctuation mark that are not “9”)
/31*/ => Check for zero or more occurrences of the numbers 3 and 1 in sequence.
For example, the following strings will all be matched by this regular expression:
“” (an empty string, zero occurrences of “31”)
“3” (one occurrence of “3”)
“1” (one occurrence of “1”)
“31” (one occurrence of “31”)
“311” (two occurrences of “31”)
“3111” (three occurrences of “31”)
Metacharacter – Dot .
‘.’ is used to check for a single character (any character that can be typed via a keyboard) other than a line break character (\n).
Here are some examples of strings that would match the regular expression /./:
a
1
#
hello
goodbye
123
abc
Similarly, the regular expression: /Action ., Scene2/ would match the following strings:
Action 1, Scene2
Action A, Scene2
Action 9, Scene2
Action &, Scene2
Here are some examples of strings that would not match the regular expression /Action ., Scene2/
Action123, Scene2 (contains more than one character after Action)
Action , Scene2 (contains a space character after Action instead of a single character)
Action,Scene2 (does not contain a space character after the comma)
Action Scene2 (does not contain a comma)
Scene2, Action (characters are not in the correct order)
Metacharacter – Pipe Symbol |
The metacharacter ‘|’ is used to create the logical OR condition.
For example:
The regular expression /His|Her/ will match any string that contains either the string ‘His‘ or the string ‘Her‘.
Here are some examples of strings that would match this regular expression:
His
Her
His book
Her book
His or Her book
Here are some examples of strings that would not match this regular expression:
HisHer (does not contain either His or Her as separate strings)
HisOrHer (does not contain either His or Her as separate strings)
book (does not contain either His or Her)
Hers (does not contain either His or Her)
this is his book (does not contain either His or Her)
Another example:
The regular expression /his|her|^their|its*|our+/ will match any string that contains any of the following patterns:
The string ‘his‘
The string ‘her‘
The string ‘their‘ at the start of the string
Zero or more occurrences of the string ‘its‘
One or more occurrences of the string ‘our‘
Here are some examples of strings that would match this regular expression:
his
her
their book
their cat
its cat
its cat and its dog
our cat
our cat and our dog
Here are some examples of strings that would not match this regular expression:
hiss (does not contain his or her as separate strings)
herr (does not contain his or her as separate strings)
book (does not contain his, her, their, its, or our)
cat (does not contain his, her, their, its, or our)
cat and dog (does not contain his, her, their, its, or our)
Metacharacter – Exclamation !
The metacharacter exclamation symbol ‘!’ is used to create the logical NOT condition. It is used to negate a character set and that’s why is also known as the negation or not metacharacter.
Note: The exclamation symbol has a different meaning when used inside of a character set. In that case, it does not act as a metacharacter.
Examples
/![a-z]/ => Check for a single character which is not a lowercase letter.
/[!a-z]/ => Check for a single character, either ‘!’ or a lowercase letter. Here ‘!’ is not treated as a metacharacter because it is used inside the character set.
/!(abc)/ => Check for a string which is not the string ‘abc’.
/(!abc)/ => Check for the string ‘!abc’. Here ‘!’ is not treated as a metacharacter because it is used inside the character set.
/![0-9]/ => Check for a single character which is not a number.
/[!0-9]/ => Check for a single character which is either ‘!’ or a number. Here ‘!’ is not treated as a metacharacter because it is used inside the character set.
/a!b/ => Check for the string ‘a!b’. Here ‘!’ is not treated as a metacharacter because it is used inside the character set.
Metacharacter – Curly Brackets {}
{} is used to check for 1 or more occurrences of the preceding character.
It is just like the metacharacter ‘+’ but it provides more control over the number of occurrences of the preceding character you want to match.
For example:
1{1} => check for 1 occurrence of the character ‘1’. This regex will match 1
1{2} => check for 2 occurrences of the character ‘1’. This regex will match 11
1{3} =>check for 3 occurrences of the character ‘1’. This regex will match 111
1{4} => check for 4 occurrences of the character ‘1’. This regex will match 1111
1{1,4} =>check for 1 to 4 occurrences of the character ‘1’. This regex will match 1,11, 111, 1111
[0-9]{2} => check for 2 occurrences of a number or in other words, check for two digits number like 12
[0-9]{3} => check for 3 occurrences of a number or in other words check for three digits number like 123
[0-9]{4} => check for 4 digits number like 1234
[0-9]{1,4} => check for 1 to 4 digits number.
[a]{1} => check for 1 occurrence of the character ‘a’. This regex will match a
[a]{2} => check for 2 occurrences of the character ‘a’. This regex will match aa
[a]{3} =>check for 3 occurrences of the character ‘a’. This regex will match aaa
[a]{4} => check for 4 occurrences of the character ‘a’. This regex will match aaaa
[a]{1,4} =>check for 1 to 4 occurrences of the character ‘a’. This regex will match a,aa,aaa,aaaa
[a-z]{2} => check for 2 occurrences of a lower case letter. This regex will match aa, bb, cc etc
[A-Z]{3} => check for 3 occurrences of a upper case letter. This regex will match AAA, BBB, CCC etc
[a-zA-Z]{2} => check for 2 occurrences of a letter (doesn’t matter whether it is upper case or lower case). This regex will match aa, aA, Aa, AA etc
[a-zA-Z]{1,4} => check for 1 to 4 occurrences of a letter (doesn’t matter whether it is upper case or lower case). This regex will match aaaa, AAAA, aAAA, AAAa etc
(rock){1} => check for 1 occurrence of the string ‘rock’. This regex will match: rock
(rock){2} => check for 2 occurrence of the string ‘rock’. This regex will match: rockrock
(rock){3} => check for 3 occurrence of the string ‘rock’. This regex will match: rockrockrock
(rock){1,4} => check for 1 to 4 occurrence of the string ‘rock’. This regex will match: rock, rockrock, rockrockrock, rockrockrockrock
Metacharacter – White Spaces
To create white space in a regular expression, just use the white space. For e.g.
/(Himanshu Sharma)/ => Check for the string ‘Himanshu Sharma’
/Himanshu Sharma/ => Check for the string ‘Himanshu Sharma’
Inverting Regex in JavaScript
Inverting a regex means inverting its meaning.
You can invert a regex in JavaScript by using positive and negative lookaheads.
Use positive lookahead if you want to match something that is followed by something else.
Use negative lookahead if you want to match something not followed by something else.
Positive Lookahead starts with (?= and ends with )
Negative Lookahead starts with (?! and ends with )
For example, the regex de\/[^a-z] will match all those pages in the de/ folder whose name doesn’t start with a lower case letter:
/de/1london-school /de/?productid=423543
but will not match:
/de/school/london
The invert of this regular expression would be: match all those pages in the de/ folder whose name starts with a lower case letter:
For example: the regex de\/(?![^a-z]) will match:
/de/school/london
but will not match:
/de/1london-school /de/?productid=423543
Note: JavaScript only supports lookaheads and not lookbehind. Google Analytics doesn’t support either lookahead or lookbehind.
More Regex Examples
^(*\.html)$ => Check for any number of characters before .html and store them in a variable.
^dog$ => Check for the string ‘dog’
^a+$ => Check for one or more occurrences of a lower case letter ‘a’
^(abc)+$ => Check for one or more occurrences of the string ‘abc’.
^[a-z]+$ => Check for one or more occurrences of a lower case letter.
^(abc)*$ => Check for any number of occurrences of the string ‘abc’.
^a*$ => Check for any number of occurrences of the lower case letter ‘a’
#. Find all the files which start from ‘elearning’ and which have the ‘.html’ file extension
^elearning* \.html$
#. Find all the PHP files
^*\.php$
Testing Regular Expressions (REGEX)
Whether you consider yourself a beginner or advanced in the use of regex, you should always test your regular expressions.
You can test regular expressions through the following:
RegExp Tester chrome extension
Regex101.com online tool
The advanced table filter on the reporting interface in GA3 with the Regex option
The preview feature of your Custom Segment in GA3
GTM debug console window for testing regex used in triggers and variables.
Using RegExpObject to test regex in GTM during run time.
Testing Regex Method #6: Using ‘RegExp’ to test regex in GTM during run time
RegExp is a regular expression object which is used to store a regular expression in JavaScript.
For example:
var regex = /^\/search\/(.*)/;
Here,
‘regex’ (as in var regex) is a regular expression object which is used to store the regular expression “/^\/search\/(.*)/“
‘test’ and ‘exec’ Methods of the ‘RegExp’ object
Both ‘test’ and ‘exec’ are the methods of the ‘RegExp’ object and are often used in Google Tag Manager to test regular expressions using run time.
‘test’ method is used to test for a match in a string.
It returns a boolean value: ‘true’ if it finds a match otherwise, it returns ‘false’
Syntax: RegExpObject.test(string to be searched)
For example:
function() {
var regex = /^\/search\/(.*)/;
var pagePath = '/search/enhanced ecommerce tracking/';
if(regex.test(pagePath)
{
var searchTerm = regex.exec(pagePath)[1];
var NewUri = "/search/?s=" + searchTerm;
return NewUri;
}
return false;
}
‘exec’ method (as in regex.exec) also test for a match in a string.
But unlike ‘test’, it returns the array which contains the matched text, if it finds the match.
Otherwise, it returns NULL.
Syntax: RegExpObject.exec(string to be searched)
‘exec’ method returns an array of all matched text.
So for the regex ^\/search\/(.*) and pagePath = ‘/search/enhanced ecommerce tracking/’
The regex.exec(pagePath) = [‘/search/enhanced ecommerce tracking/’, ‘enhanced ecommerce tracking/’];
The regex.exec(pagePath)[0] = [‘/search/enhanced ecommerce tracking/’];
The regex.exec(pagePath)[1] = [‘enhanced ecommerce tracking/’];
So when we use regex.exec(pagePath)[1] we can extract the search string from the request URI.
Tips for using regex in Google Analytics and Google Tag Manager
#1: Use the “|” (pipe) symbol wisely. Since “|” represents the ‘or’ condition, it is not wise to use the pipe symbol at the beginning or end of the regular expression, which may then spoil your required dataset.
#2: If you are unsure about all the possible combinations in a regex, use “.*” to find a list of all possible combinations in your data set.
#3: Don’t use spaces in regular expressions. White spaces in a regular expression can ruin the results you are expecting. Using a regex tester tool is best before using regular expressions in Google Analytics or Google Tag Manager.
#4: Regular expressions are not case-sensitive. However, you can make regular expressions case-sensitive in Google Analytics.
#5: Google Analytics can support regular expressions with up to 256 characters. If your regular expression exceeds 256 characters, it won’t work. Hence make sure to keep your regex character limit below 256.
#6: If you use regular expressions in custom JavaScript tags using Google Tag Manager, always remember to add comments in front of regular expressions.
How to become a regex power user overnight?
The biggest problem in using regex is creating the correct regex.
You can ask ChatGPT to create a JavaScript regex that matches your specified pattern:
Then test the regex supplied by ChatGPT via a tool like regex101.com/. It’s likely to work.
You no longer need a PHD in regex to use it.
Using regex with mod_rewrite and configuration directives
In order to block referrer spam in GA3 and use regex for other purposes (like SEO), you will need a good understanding of mod_rewrite, configuration directives (like RewriteEngine) and .htaccess file.
About mod_rewrite
It is a module (function) written in the ‘C’ programming language: ‘mod_rewrite.c‘.
This module works only with Apache server 1.2 or later and is called from the .htaccess file (ASCII file, which contains configuration directives and rules for files and folders).
Through this module, you can:
Re-Write URLs
Redirect URLs
Solve Canonical URL issues
Solve Hotlinking issues
Block visitors from accessing a particular folder, file or the whole website.
Create custom 403 and 404 pages.
Deliver content on the basis of the IP address and benefits are endless.
Here pattern is a regular expression and substitution is a URL.
FLAGS can be [R], [F], [NC], [QSA], [L], [OR] etc.
[R] => Redirect. It’s default value is 302. It can be assigned any number from 300 to 400. For e.g.
RewriteRule ^index\.html$ /index.php [r=301]
[F] => Forbidden. It is generally used with a hyphen (-). The hyphen tells the server not to perform any substitution. This flag tells the server not to fulfill the request and return the ‘403’ response code. For e.g.
RewriteRule ^product-price\.php$ -[F]
[NC] => It tells the server to ignore uppercase or lowercase when checking for patterns. For e.g.
RewriteRule ^him*\.php$ [nc]
[QSA] => Query String append. It tells the server to pass query string from the original URL to the new URL.
[L] => Last rule. This tag tells the server not to process any more rules.
[OR] => Logical OR. This flag is used as a logical OR for RewriteCond statements.
RewriteCond
This configuration directive tells the server to interpret the given statement as a condition for the rule which immediately follows it.
Syntax:
Here first mod-rewrite matches each URL with the given pattern.
If no URL matches the pattern, then mod_rewrite process the next rule.
If a URL matches the pattern, then mod_rewrite looks for the corresponding RewriteCond.
If no corresponding RewriteCond exist, then the matched URL is replaced by the substitution.
If the corresponding RewriteCond exist, then each RewriteCond is processed in the order they appear from top to bottom.
Each RewriteCond is processed by matching its test string against its corresponding condition pattern.
If the test string doesn’t match with its condition pattern, then mod_rewrite process the next rule, otherwise it processes the next RewriteCond.
When all RewriteConds are successfully processed, then the matched URL is replaced by the substitution.
A test string can be:
1. A simple text 2. RewriteRule back reference 3. RewriteCond back reference 4. Server Variable
RewriteRule Back Reference
It is of the form $N, where N can be any number from o to 9. It is used to denote that variable that was created in the RewriteRule pattern. For e.g.
RewriteRule ^(.*)$ /index.php/$1 [L]
RewriteCond Back Reference
It is of the form %N, where N can be any number from 1 to 9. It is used to denote that variable that was created in the ‘condpattern’ from the last matched ‘RewriteCond’. For e.g.
RewriteCond %{HTTP_HOST} ^(123\.42\.162\.7)$
RewriteCond %1 ^123\.42\.162\.7$
RewriteRule ……………..
Server Variable
Syntax: % {Variable_Name}
E.g.
1. %{HTTP_HOST} – This variable gives information about the server name and its IP address.
2. %{HTTP_USER_AGENT} – This variable gives information about the user’s operating system and browser.
3. %{QUERY_STRING} – This variable returns the query string.
4. %{HTTP_REFERER} – This variable returns the URL of the referer.
5.%{REMOTE_ADDR} -This variable returns the IP address of the referer.
About .htaccess File
It is an ASCII file that contains configuration directives and rules for files, folders, and the whole website.
You can have more than one .htaccess file on a server. In fact, you can have one .htaccess file per folder/directory.
When you put the file in a directory, the rules mentioned in it are applicable only to all the files and sub-directories in the directory.
When you put the file in the root directory, the rules mentioned in it are applicable to all the files and directories on the server.
A .htaccess file must contain the following two lines:
Options +FollowSymLinks RewriteEngine on
How to Block Referrer Spam in GA3 via Regex and RewriteCond
Once you have identified spam referrers, block them from visiting your website again.
Since the bot visit is recorded in your server log, you can block such bots through the .htaccess file (or equivalent).
Following are the various methods you can use to block referrer spam:
Block the referrer used by a spambot
Block the IP address used by the spam bot
Block the IP address range used by a spam bot
Block the user agents used by spambots
Method #1: Block the referrer used by a spam bot
Access your .htaccess file and add the following code to block all http and https referrals from a spambot like “blackhatworth.com” and all subdomains of “blackhatworth.com“:
Create a similar code to block the referrer used by other spambots.
Method #2 Block the IP address used by the spam bot
Access your .htaccess file and add code like the one below:
RewriteEngine On
Options +FollowSymlinks
Order Deny,Allow
Deny from 234.45.12.33
Note: Do not copy-paste this code into your .htaccess, it won’t work. This is just an example to show you how to block an IP address in the .htaccess file. Spambots can come from many different IP addresses. So you need to keep adding IP addresses used by the spambots affecting your website.
Method #3: Block the IP address range used by a spam bot
If you are sure that a particular range of IP addresses is being used by spam bots then you can block the whole IP address range like the one below:
RewriteEngine On
Options +FollowSymlinks
Deny from 76.149.24.0/24
Allow from all
Here 76.149.24.0/24 is a CIDR range.
CIDR is a method used for representing a range of IP addresses.
Blocking by CIDR is more effective than blocking by individual IP addresses as it takes less space on your server.
Method #4: Block the user agents used by a spam bot
Go through your server log files once in a week and find and ban malicious user agents (user agents used by spambots).
Blocked user agents can not access your website.
You can block rogue user agents like the one below:
RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]
RewriteRule .* – [F,L]
A simple search on Google can give you a big list of several websites that maintain records of known rogue user agents.
Other use cases of Regex (Regex in SEO)
Besides Google Analytics and Google Tag Manager, regex is widely used in Search Engine Optimization (SEO).
The following are the advantages of using regex in SEO:
1. You can convert long ugly dynamic URLs into SEO-friendly URLs. 2. You can apply the correct redirects. 3. Prevent people from hotlinking your images 4. Block spam bots 5. Resolve canonical URL issues 6. Resolve duplicate content issues (to an extent) 7. Deliver geo-specific content based on the IP address
Example-1: Redirect all requests for pages in the media folder to a new page ‘media.html’.
RewriteRule ^media/$ /media.html [r=301,l]
Example-2: Redirect oldaddress.html page to newaddress.html page
The above code will permanently redirect file1.html to file2.html. So whenever a search engine or a visitor will look for file1.html, he will automatically be redirected to file2.html.
Example-9: Convert Dynamic URL into Static Looking SEO friendly URL
This code will redirect https://www.example.com/productdescription.php?keyval=25&keyval2=62 to https://www.example.com/whiteboard-accessories.php
Note: You need to put a question mark (?) at the end of the substitution URL, otherwise query string will be appended at the end of the substitution URL.
Create a web page that you want to display as your custom 404 page say custom404.php and then upload your webpage to the root directory. Now add the following code to your .htaccess file:
Options +FollowSymLinks RewriteEngine on ErrorDocument 404 https://www.mywebsite.com/custom404.php
Example-12: Block an IP address from accessing your website
Add the following code in your .htaccess file:
Options +FollowSymLinks RewriteEngine on Order Deny, Allow Deny from 61.16.153.67
If you want to block two or more IP addresses:
Options +FollowSymLinks RewriteEngine on Order Deny, Allow Deny from 61.16.153.67 Deny from 124.202.86.42
Example-13: Resolve the Hot Linking Issue
Hot-linking means direct linking to your website file (images, videos, etc). By preventing hot-linking, you can save your server bandwidth.
Replace ‘mywebsite’ by your website name and then use a hotlinking checker tool to find out whether your files (images, videos, etc ) can be hot-linked or not.
Example-14: Enable proxy caching for static resources
Add following code to your .htaccess file
<FilesMatch “\.(gif|jpe?g|png)$”> Header set Cache-Control “public” </FilesMatch>
Frequently Asked Questions About Regex in Google Analytics & Google Tag Manager
What is a regular expression?
It is an expression that is used to check for a pattern in a string. For e.g. ^Colou?r$ is a regular expression that matches both the string: ‘color’ and ‘colour’. A regex is made up of characters and metacharacters.
What are metacharacters?
These are the characters that have special meanings in the regex. They are the building blocks of a regex. For e.g. ^, (), {}, $, +, * etc.
What is a regex engine?
Implementation of regex functionality using a particular type of syntax is called a regex engine.
There are many types of regex engines available. The most popular among them are:
1) PCRE (PHP) 2) JavaScript 3) Python 4) Golang
Different regex engines support different types of syntax, and the meaning of metacharacters may change depending on the regex engine being used. Thus, a regular expression considered valid under one regex engine may not be considered valid under another.
Whenever you test a regex using a regex tester tool, you can select the flavour under which you want to test your regular expression.
Which regex engine is used by Google Analytics and Google Tag Manager?
The regex engine used by Google Analytics and Tag Manager is JavaScript. So always select ‘JavaScript’ as the Flavor before testing your regular expressions for GA/GTM.
What Are the Advantages of using REGEX in Google Analytics?
There are many cases where regular expressions are very useful in Google Analytics. Some of such cases are:
#1. Setting up a goal which should match multiple-goal pages instead of one. #2. Setting up a funnel in which a funnel step should match multiple pages instead of one. #3. Excluding traffic from an IP address range via filters. #4. Setting up complex custom segments like the segments which can filter out branded keywords. #5. Understanding the commercial value of long-tail keywords.
Register for the FREE TRAINING...
"How to use Digital Analytics to generate floods of new Sales and Customers without spending years figuring everything out on your own."
Here’s what we’re going to cover in this training…
#1 Why digital analytics is the key to online business success.
#2 The number 1 reason why most marketers are not able to scale their advertising and maximize sales.
#3 Why Google and Facebook ads don’t work for most businesses & how to make them work.
#4 Why you won’t get any competitive advantage in the marketplace just by knowing Google Analytics.
#5 The number 1 reason why conversion optimization is not working for your business.
#6 How to advertise on any marketing platform for FREE with an unlimited budget.
#7 How to learn and master digital analytics and conversion optimization in record time.
My best selling books on Digital Analytics and Conversion Optimization
Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.
Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.
Attribution Modelling in Google Analytics and BeyondSECOND EDITION OUT NOW!
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.
Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.
About the Author
Himanshu Sharma
Founder, OptimizeSmart.com
Over 15 years of experience in digital analytics and marketing
Author of four best-selling books on digital analytics and conversion optimization
Nominated for Digital Analytics Association Awards for Excellence
Runs one of the most popular blogs in the world on digital analytics
Consultant to countless small and big businesses over the decade
Get My Step-By-Step Blueprint For Finding The Best KPIs (32 pages ebook)
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.