Google Analytics and Google Tag Manager Regex (Regular Expressions) Guide

In this article, I am going to explain the building blocks of Regular Expression (or REGEX), so that you can understand them and use them in Google Analytics and Google Tag Manager.

Table of contents

#1  What is Regular Expression? #17 Pipe |
.
#2 What are Metacharacters? #18 Exclamation !
.
#3 Types/flavors of regular expressions (aka Regex Engines) #19 Curly Brackets {}
.
#4 What is the advantages of using REGEX in Google Analytics? #20 White Spaces
.
#5  What is the advantages of using REGEX in Google Tag Manager? #21 Inverting regex in JavaScript
.
#6 Building Blocks of a Regular Expression #22 More Regex Examples
.
#7  Other Meta characters in Regex #23  Testing Regular Expressions (REGEX)
.
#8  Escaping Character – back slash \ #24 Introduction to mod_rewrite
.
#9  Caret ^  #25  Types of Configuration Directives
.
#10 Dollar $  #26  RewriteEngine
.
#11  Square Bracket []  #27  RewriteRule
.
#12  Parenthesis ()  #28  RewriteCond
.
#13  Question mark ?  #29  Introduction to .htaccess file
.
#14 Plus + # 30 How to block referrer spam in Google Analytics via Regex and RewriteCond
.
#15  Multiply * # 31  Other use cases of regex (Regex in SEO)
.
#16 Dot .

What is Regular Expression?


It is an expression which is used to check for a pattern in a string.

For e.g. ^Colou?r$ is a regular expression which matches both the string: ‘color’ and ‘colour’.

A regex is made up of characters and metacharacters.

Note: The regex used in this article are for JavaScript.

 


What are Metacharacters?


These are the characters which have special meaning in regex.

They are the building blocks of a regex.

For e.g. [], ^, (), {}, $, +, * etc.

 


Types/flavors of regular expressions (aka Regex Engines)


Implementations of regex functionality using a particular type of syntax is called a regex engine.

There are many types of regex engines available.

The most popular among them are:

  1. PCRE (PHP)
  2. JavaScript
  3. Python
  4. Golang

Different regex engine support different type of syntax and the meaning of metacharacters may change depending upon the regex engine being used.

Thus a regular expression which is considered valid under one regex engine may not be considered valid under another regex engine.

Whenever you a test a regex using a regex tester tool, you get the option to select the flavor under which you want to test your regular expression:

Since the regex engine used by Google Analytics and Google Tag Manager is ‘JavaScript’, you should always select ‘JavaScript’ as the Flavor before testing your regular expressions for GA/GTM.

 


What is the advantages of using REGEX in Google Analytics?


There are many cases where regular expressions are very useful in Google Analytics. Some of such cases are:

#1. Setting up a goal which should match multiple goal pages instead of one:

#2. Setting up a funnel in which a funnel step should match multiple pages instead of one.

Infact when you set up a funnel, all URLs are treated as regular expressions:

#3. Excluding traffic from a IP address range via filters

Infact there are many filters which require regular expressions. 

Big organizations generally own a range of IP addresses. 

Therefore to exclude organization’s internal traffic you need to specify a IP range using regex:

#4. Setting up complex custom segments like the segments which can filter out branded keywords:

#5. Understanding the commercial value of long tail keywords:

#6. Rewriting URLs in Google Analytics reports.

For example appending hostname to the request URI:

You can also rewrite URLs in Google Analytics reports with ‘search and replace’ advanced filter.

This comes handy when your website has very long ugly dynamic URLs and you can’t figure out what the page is all about just by looking at its URL.

So for example with ‘Search & Replace’ advanced filter you can ask GA to report the following URL:

https://www.abc.com/fder/?catg=2341&pid=428

as

https://www.abc.com/outdoor/fleeces

#7. Filtering data based on complex patterns within the Google Analytics reporting interface.

For example following regex can segment all the traffic coming from social media sites: 

twitter\.com|facebook\.com|linkedin\.com|plus\.google\.com|t\.co|bit\.ly|reddit\.com

#8 Finding referrer spam in Google Analytics.

For example you can use the following regex (not foolproof) to filter out all the spam referrers in the ‘Referrals’ report:

button|ilovevitaly|darodar|hulfingtonpost|ranksonic|[0-9]{1,3}\.[0-9]{1,3}|website

#9 Blocking spam referrers through custom advanced filter in Google Analytics.

For example following filter should block all of the traffic from spam referrers you identified:

block-spam-bots2

#10 Using regular expressions while creating content groups in Google Analytics:

#11 Using regular expressions while creating Channel grouping in Google Analytics:

new social definition

#11 Using regular expressions while Tracking Site Search without Query Parameter

#12 Using regular expressions in debugging Google Analytics tracking issues.

 


What is the advantages of using REGEX in Google Tag Manager?


Through regular expressions you can:

#1 Set up complex triggers in GTM:

#2 Use the regex table variable in Google Tag Manager.

#3 You can use regex in custom JavaScript variable like when Tracking Site Search without Query Parameter in Google Tag Manager:

 


Building Blocks of a Regular Expression


regex cheatsheet for Google Analytics

Other Meta characters in Regex

other meta characters

Get the E-Book (37 Pages)

Learn to read e-commerce reports book banner

Get the E-Book (104 Pages)

 


Escaping Character – back slash \


‘\’ is the escaping character which is used to escape from the normal way a subsequent character is interpreted.

Though escaping character you can convert a regular character into meta character or turn a meta character into a regular character. 

For example:

Forward Slash (/) has a special meaning in regex. It is used to mark the beginning and end of a regular expression.

For example:

var a = /colou?r/;

If you want regex to treat forward slash as a forward slash and not some special character then you need to use it along with the escaping character like this: \/

So if you want to check for a pattern say /shop/ in the string /shop/collection/men/

then your regex should be: \/shop

If you use the regex /shop then it won’t match the string /shop/collection/men/ because / would be treated as a special character instead of a regular forward slash.

Another example:

n’ is a regular character. But if you add escaping character before it then it would become a meta character: \n which is a new line character.

So if you want to check for a pattern say \n in the string abcd\n3456

then your regex should be: abcd\\n3456

If you use the regex abcd\n then it won’t match the string abcd\n3456 because \n would be treated as a newline character instead of a regular character.

Another example:

?‘ is a meta character. To make it a regular character, you need to add escaping character before: \?

So if you want to check for a question mark in the string colou?r

then your regex should be: colou\?r

If you use the regex colou?r then it would match the string color or colour and not colou?r as then ? will be treated as meta character.

 


Caret  ^


^’ – This is known as ‘Caret’ and is used to denote the beginning of a regular expression.

^\/Colou?r => Check for a pattern which starts with ‘/Color’ or ‘/Colour’. Example:

/Colour/?proid=3456/review

/Color-red/?proid=3456/review

^\/Nov(ember)? => Check for a pattern which starts with ‘/Nov’ or ‘/November’. Example:

/November-sales/?proid=3456/review

/Nov-sales/?proid=3456/review

^\/elearning\.html => Check for a pattern which starts with ‘/elearning.html’. Example:

/elearning.html/?proid=3456/review

^\/.*\.php => Check for a pattern which starts with any file with .php extension. Example:

/elearning.php/color/?proid=3456/review

/games.php/?proid=3456/

/a1.php/color/?proid=3456&gclid=153dwf3533

^\/product-price\.php => Check for a pattern which starts with ‘/product-price.php’. Example:

/product-price.php?proid=123&cid=2142

/product-price.php?cid=2142&gclid=442352df

 

Caret also means NOT when used after the opening square bracket.

[^a] => Check for any single character other than the lowercase letter ‘a’.

For example: the regex product-[^a] will match:

/shop/men/sales/product-b

/shop/men/sales/product-c

[^B] = > Check for any single character other than the uppercase letter ‘B’.

For example: the regex product-[^B] will match:

/shop/men/sales/product-b

/shop/men/sales/product-c

[^1] => Check for any single character other than the number ‘1’.

For example: the regex proid=[^1] will match:

/men/product-b?proid=3456&gclid=153dwf3533

but will not match:

/men/product-b?proid=1456&gclid=153dwf3533

[^ab] => Check for any single character other than the lower case letters ‘a’ and ‘b’.

For example: the regex location=[^ab] will match:

/shop/collection/prodID=141?location=canada

but will not match:

/shop/collection/prodID=141?location=america

/shop/collection/prodID=141?location=bermuda

[^aB] => Check for any single character other than the lower case letter ‘a’ and uppercase letter ‘B’.

[^1B] => Check for any single character other than the number ‘1’ and uppercase letter ‘B’

[^Dog] => Check for any single character other than the following: uppercase letter ‘D’, lowercase letter ‘o’ and lowercase letter ‘g’.

For example: the regex location=[^Dog] will match:

/shop/collection/prodID=141?location=canada

/shop/collection/prodID=141?location=denmark

but will not match:

/shop/collection/prodID=141?location=Denver

/shop/collection/prodID=141?location=ontario

/shop/collection/prodID=141?location=greenland

[^123b] => Check for any single character other than the following characters: number ‘1’, number ‘2’, number ‘3’ and lowercase letter ‘b’.

[^1-3] => Check for any single character other than the following: number ‘1’, number ‘2’ and number ‘3’.

For example: the regex prodID=[^1-3] will match:

/shop/collection/prodID=45321&cid=1313

/shop/collection/prodID=5321&cid=13442

but will not match:

/shop/collection/prodID=12321&cid=1313

/shop/collection/prodID=2321&cid=1313

/shop/collection/prodID=321&cid=1313

[^0-9] => Check for any single character other than the number.

For example: the regex de\/[^0-9] will match all those pages in the de/ folder whose name doesn’t start with a number:

/de/school-london

/de/general/

but will not match:

/de/12fggtyooo

[^a-z] => Check for any single character which is not a lower case letter.

For example: the regex de\/[^a-z] will match all those pages in the de/ folder whose name doesn’t start with a lower case letter:

/de/1london-school
/de/?productid=423543

but will not match:

/de/school/london

[^A-Z] => Check for any single character which is not a upper case letter.

 


Dollar  $


$’ – It is used to denote the end of a regular expression or ending of a line. For e.g.

Colou?r$ => Check for a pattern which ends with ‘Color’ or ‘Colour’

Nov(ember)?$ => Check for a pattern which ends with ‘Nov’ or ‘November’

elearning\.html$ => Check for a pattern which ends with ‘elearning.html’

\.php$ => Check for a pattern which ends with .php

product-price\.php$ => Check for a pattern which ends with ‘product-price.php’

 


Square Bracket  []


‘[]’ – This square bracket is used to check for any single character in the character set specified in []. For e.g:

[a] => Check for a single character which is a lowercase letter ‘a’.

[ab] => Check for a single character which is either a lower case letter ‘a’ or ‘b’.

[aB] => Check for a single character which is either a lower case letter ‘a’ or uppercase letter ‘B’

[1B] => Check for a single character which is either a number ‘1’ or an uppercase letter ‘B’.

[Dog] => Check for a single character which can be anyone of the following: uppercase letter ‘D’, lower case letter ‘o’ or lowercase letter ‘g’.

[123b] => Check for a single character which can be anyone of the following: number ‘1’, number ‘2’, number ‘3’ or lowercase letter ‘b’.

[1-3] => Check for a single character which can be any one number from 1, 2 and 3.

[0-9] => Check for a single character which is a number.

[a-d] => Check for a single character which can be any one of the following lower case letter: ‘a’, ‘b’, ‘c’ or ‘d’.

[a-z] => Check for a single character which is a lower case letter.

[A-Z] => Check for a single character which is a upper case letter.

[A-T] => Check for a single character which can be any uppercase letter from ‘A’ to ‘T’.

[home.php] => Check for a single character which can be anyone of the following characters: lowercase letter ‘h’, lowercase letter ‘o’, lowercase letter ‘m’, lowercase letter ‘e’, special character ‘.’, lower case letter ‘p’, lowercase letter ‘h’ or lowercase letter ‘p’

Note: if you want to check for a letter regardless of its case (upper case or lower case) then use [a-zA-Z]

 


Parenthesis ()


()’ – This is known as parenthesis and is used to check for a string. For e.g.

(a) => Check for string ‘a’

(ab) => Check for string ‘ab’

(dog) => Check for string ‘dog’

(dog123) => Check for string ‘dog123’

(0-9) => Check for string ‘0-9’

(A-Z) => Check for string ‘A-Z’

(a-z) => Check for string ‘a-z’

(123dog588) => Check for string ‘123dog588’

Note: () is also used to create and store variables. For e.g. ^ (.*) $

 


Question mark  ?


‘?’ is used to check for zero or one occurrence of the preceding character. For e.g.

[a]? => Check for zero or one occurrence of lowercase letter ‘a’.

[dog]? => Check for zero or one occurrence of lowercase letter ‘d’, ‘o’ or ‘g’.

[^dog]? => Check for zero or one occurrence of a character which is not the lowercase letter ‘d’, ‘o’ or ‘g’.

[0-9]? => Check for zero or one occurrence of a number

[^a-z]? => Check for zero or one occurrence of a character which is not a lower case letter.

^colou?r$ => check for color or colour.

^Nov(ember)28(th)?$ => check for ‘nov 28’, ‘november 28, Nov 28th and November 28th

Note: ? when used inside a regular expression makes the preceding letter or group of letters optional.

For e.g. the regular expression: ^colou?r$ matches both ‘color’ and ‘colour’. Similarly, the regular expression: ^Nov(ember)28(th)?$ matches: ‘nov 28’, ‘november 28, Nov 28th and November 28th

 


Plus  +


‘+’ is used to check for one or more occurrences of the preceding character. For e.g.

[a]+ => Check for one or more occurrences of lowercase letter ‘a’.

[dog]+ => Check for one or more occurrences of letters ‘d’, ‘o’ or ‘g’.

[548]+ => Check for one or more occurrences of numbers ‘5’, ‘4’ or ‘8’.

[o-9]+ => Check for one or more numbers

[a-z]+ => Check for one or more lower case letters

[^a-z]+ => Check for one or more characters which are not lowercase letters.

[a-zA-z]+ => Check for any combination of uppercase and lowercase letters.

[a-z0-9]+ => Check for any combination of lowercase letters and numbers.

[A-Z0-9]+ => Check for any combination of uppercase letters and numbers.

[^9]+ => Check for one or more character which is not the number 9.

 


Multiply *


*‘ is used to check for any number of occurrences (including zero occurrences) of the preceding character.

For example, 31* would match 3, 31, 311, 3111, 31111 etc.

 


Dot .


‘.’ is used to check for a single character (any character that can be typed via keyboard other than a line break character (\n)).

For example the regular expression: Action ., Scene2 would match:

  • Action 1, Scene2
  • Action A, Scene2
  • Action 9, Scene2
  • Action &, Scene2

but not

  • Action 10,Scene2
  • Action AB,Scene2

 


Pipe |


‘|’ is the logical OR . For example:

(His|Her) => Check for the string ‘his’ or ‘her’.

His|Her => Check for the string ‘his’ or ‘her’. For example, the regex his|her will match:

  1. this is his book
  2. this is her book
  3. his or her
  4. her or his

 


Exclamation !


‘!’ – It is a logical NOT. But unlike ^ (caret), it is used only at the beginning of a rule or a condition. For e.g.

  1. (!abc) => Check for a string which is not the string ‘abc’.
  2. [!0-9] => Check for a single character which is not a number.
  3. [!a-z] => Check for a single character which is not a lower case letter.

 


Curly Brackets {}


{} is used to check for 1 or more occurrence of the preceding character.

It is just like the meta character ‘+’ but it provides more control on the number of occurrences of the preceding character you want to match.

For example:

1{1} => check for 1 occurrence of the character  ‘1’. This regex will match 1

1{2}  => check for 2 occurrences of the character  ‘1’. This regex will match 11

1{3} =>check for 3 occurrences of the character  ‘1’. This regex will match 111

1{4}  => check for 4 occurrences of the character  ‘1’. This regex will match 1111

1{1,4}  =>check for 1 to 4 occurrences of the character  ‘1’. This regex will match 1,11, 111, 1111

[0-9]{2}  => check for 2 occurrences of a number or in other words, check for two digits number like 12

[0-9]{3}  => check for 3 occurrences of a number or in other words check for three digits number like 123

[0-9]{4} => check for 4 digits number like 1234

[0-9]{1,4} => check for 1 to 4 digits number.

 

[a]{1} => check for 1 occurrence of the character  ‘a’. This regex will match a

[a]{2}  => check for 2 occurrences of the character  ‘a’. This regex will match aa

[a]{3} =>check for 3 occurrences of the character  ‘a’. This regex will match aaa

[a]{4}  => check for 4 occurrences of the character  ‘a’. This regex will match aaaa

[a]{1,4}  =>check for 1 to 4 occurrences of the character  ‘a’. This regex will match a,aa,aaa,aaaa

[a-z]{2}  => check for 2 occurrences of a lower case letter. This regex will match aa, bb, cc etc

[A-Z]{3}  => check for 3 occurrences of a upper case letter. This regex will match AAA, BBB, CCC etc

[a-zA-Z]{2} => check for 2 occurrences of a letter (doesn’t matter whether it is upper case or lower case). This regex will match aa, aA, Aa, AA etc

[a-zA-Z]{1,4} => check for 1 to 4 occurrences of a letter (doesn’t matter whether it is upper case or lower case). This regex will match aaaa, AAAA, aAAA, AAAa etc

 

(rock){1} => check for 1 occurrence of the string ‘rock’. This regex will match: rock

(rock){2} => check for 2 occurrence of the string ‘rock’. This regex will match: rockrock

(rock){3} => check for 3 occurrence of the string ‘rock’. This regex will match: rockrockrock

(rock){1,4} => check for 1 to 4 occurrence of the string ‘rock’. This regex will match: rock, rockrock, rockrockrock, rockrockrockrock

 


White Spaces


To create a white space in a regular expression, just use the white space. For e.g.

(Himanshu Sharma) => Check for the string ‘Himanshu Sharma’

 


Inverting Regex in JavaScript


Inverting a regex means inverting its meaning. You can invert a regex in JavaScript by using positive and negative lookaheads.

Use positive lookahead if you want to match something that is followed by something else.

Use negative lookahead if you want to match something not followed by something else.

Positive Lookahead starts with (?= and ends with )

Negative Lookahead starts with (?! and ends with )

For example: the regex de\/[^a-z] will match all those pages in the de/ folder whose name doesn’t start with a lower case letter:

/de/1london-school
/de/?productid=423543

but will not match:

/de/school/london

The invert of this regular expression would be: match all those pages in the de/ folder whose name starts with a lower case letter:

For example: the regex de\/(?![^a-z]) will match:

/de/school/london

but will not match:

/de/1london-school
/de/?productid=423543

Note: JavaScript only support lookaheads and not lookbehind. Google analytics doesn’t support either lookahead or lookbehind.

 


More Regex Examples


^(*\.html)$ => Check for any number of characters before .html and store them in a variable.

^dog$ => Check for the string ‘dog’

^a+$ => Check for one or more occurrences of a lower case letter ‘a’

^(abc)+$ => Check for one or more occurrences of the string ‘abc’.

^[a-z]+$ => Check for one or more occurrences of a lower case letter.

^(abc)*$ => Check for any number of occurrences of the string ‘abc’.

^a*$ => Check for any number of occurrences of the the lower case letter ‘a’

#. Find all the files which start from ‘elearning’ and which have the ‘.html’ file extension

^elearning* \.html$

#. Find all the PHP files

^*\.php$

 


Testing Regular Expressions (REGEX)


Whether you consider yourself as beginner or advanced in the use of regex, you should always test your regular expressions.

You can test regular expressions through:

  1. RegExp Tester chrome extension
  2. Regex101.com online tool
  3. The advanced table filter on the reporting interface in GA with the Regex option
  4. Preview feature of your Custom Segment in GA
  5. GTM debug console window for testing regex used in triggers and variables.
  6. Using RegExpObject to test regex in GTM during run time.

Testing Regex Method #1: RegExp Tester chrome extension

RegExp Tester is a chrome extension which is used to create and validate regular expressions (or regex):

regular expression checker

Here the highlighted search result (i.e. optimize smart) is the pattern which matches with my regex.

Her my regex job is to filter out two words keyword phrases.

 

Testing Regex Method #2: Regex101.com online tool

Regex101.com is online tool used for creating and testing regular expressions.

Following is the interface of ‘Regex101’ tool:

NoteUse the ‘JavaScript’ flavour as Google Analytics accept POSIX JavaScript regular expressions.

 

Testing Regex Method #3: Advanced table reporting filter in GA

You can create and test regex in Google Analytics by using the advanced table filter on the reporting interface with the Regex option:

 

Testing Regex Method #4: Preview feature of Custom Segment in GA

You can create and test regex in Google Analytics by using the preview feature of your custom segment:

 

Testing Regex Method #5: GTM debug console window

For GTM you can use the debug console window to test the regex used in triggers and variables:

 

Testing Regex Method #6: Using ‘RegExp’ to test regex in GTM during run time

RegExp is a regular expression object which is used to store a regular expression in JavaScript.

For example:

var regex = /^\/search\/(.*)/;

Here,

‘regex’ (as in var regex) is a regular expression object which is used to store the regular expression “/^\/search\/(.*)/

 

‘test’ and ‘exec’ Methods of the ‘RegExp’ object

Both ‘test’ and ‘exec’ are the methods of the ‘RegExp’ object and are often used in Google Tag Manager to test regular expressions using run time.

test’ method is used to test for a match in a string.

It returns a boolean value: ‘true’ if its find a match, otherwise it returns ‘false’

Syntax: RegExpObject.test(string to be searched)

For example:

function() {
  var regex = /^\/search\/(.*)/;
  var pagePath = '/search/enhanced ecommerce tracking/';
  if(regex.test(pagePath) 
  {
  var searchTerm = regex.exec(pagePath)[1];
  var NewUri = "/search/?s=" + searchTerm;
  return NewUri;
  }
  return false;
}

 

exec’ method (as in regex.exec) also test for a match in a string.

But unlike ‘test’, it returns the array which contains the matched text, if it finds the match.

Otherwise it returns NULL.

Syntax: RegExpObject.exec(string to be searched)

‘exec’ method return an array of all matched text.

So for the regex ^\/search\/(.*) and pagePath = ‘/search/enhanced ecommerce tracking/’

The regex.exec(pagePath) = [‘/search/enhanced ecommerce tracking/’, ‘enhanced ecommerce tracking/’];

The regex.exec(pagePath)[0] = [‘/search/enhanced ecommerce tracking/’];

The regex.exec(pagePath)[1] = [‘enhanced ecommerce tracking/’];

So when we use regex.exec(pagePath)[1] we can extract the search string from the request URI.

 


Introduction to mod_rewrite


It is a module (function) written in ‘C’ programming language: ‘mod_rewrite.c’.

This module works only with Apache server 1.2 or later and is called from the .htaccess file (ASCII file which contains configuration directives and rules for files and folders).

Through this module you can:

  1. Re-Write URLs
  2. Redirect URLs
  3. Solve Canonical URL issues
  4. Solve Hot linking issues
  5. Block visitors from accessing a particular folder, file or the whole website.
  6. Create custom 403 and 404 pages.
  7. Deliver contents on the basis of the IP address and benefits are end less.
  8. Block referrer spam in Google Analytics.

Types of Configuration Directives


There are 9 types of configuration directives:

  1. RewriteEngine
  2. RewriteOptions
  3. RewriteLog
  4. RewriteLogLevel
  5. RewriteLock
  6. RewriteMap
  7. RewriteBase
  8. RewriteRule
  9. RewriteCond

But here we will talk about only three directives:

  • RewriteEngine
  • RewriteRule
  • RewriteCond.

I have not found any good use of other directives, in the context of Google Analytics.

 


RewriteEngine


This configuration directive is used to enable or disable the mod-rewrite module.

Syntax: RewriteEngine on/off

Default Value: RewriteEngine off

That’s why in .htaccess file we first enable the mod-rewrite module by adding the following code:

Options +FollowSymLinks
RewriteEngine on

 


RewriteRule


This configuration directive tells the server to interpret the given statement as a rule.

Syntax: RewriteRule <pattern> <substitution> [FLAGS]

Here pattern is a regular expression and substitution is a URL.

FLAGS can be [R], [F], [NC], [QSA], [L], [OR] etc.


[R] =>
Redirect. Its default value is 302. It can be assigned any number from 300 to 400. For e.g.

RewriteRule ^index\.html$ /index.php [r=301]


[F] =>
Forbidden. It is generally used with hyphen (-). The hyphen tells the server not to perform any substitution. This flag tells the server not to fulfill the request and return ‘403’ response code. For e.g.

RewriteRule ^product-price\.php$ -[F]


[NC] =>
It tells the server to ignore uppercase or lowercase when checking for patterns. For e.g.

RewriteRule ^him*\.php$ [nc]


[QSA] =>
Query String append. It tells the server to pass query string from the original URL to the new URL.

[L] => Last rule. This tag tells the server not to process any more rules.

[OR] => Logical OR. This flag is used as logical OR for RewriteCond statements.

 


RewriteCond


This configuration directive tells the server to interpret the given statement as a condition for the rule which immediately follows it.

Syntax:


Here first mod-rewrite matches each URL with the given pattern.

If no URL matches the pattern, then mod_rewrite process the next rule.

If a URL matches the pattern, then mod_rewrite looks for the corresponding RewriteCond.

If no corresponding RewriteCond exist, then the matched URL is replaced by the substitution.

If corresponding RewriteCond exist, then each RewriteCond is processed in the order they appear from top to bottom.

Each RewriteCond is processed by matching its test string to against its corresponding condition pattern.

If test string doesn’t matches with its condition pattern, then mod_rewrite process the next rule, otherwise it process the next RewriteCond.

When all RewriteConds are successfully processed, then the matched URL is replaced by the substitution.

A test string can be:

1. A simple text
2. RewriteRule back reference
3. RewriteCond back reference
4. Server Variable

RewriteRule Back Reference

It is of the form $N, where N can be any number from o to 9. It is used to denote that variable which was created in the RewriteRule pattern. For e.g.

RewriteRule ^(.*)$ /index.php/$1 [L]

 

RewriteCond Back Reference

It is of the form %N, where N can be any number from 1 to 9. It is used to denote that variable which was created in the ‘condpattern’ from the last matched ‘RewriteCond’. For e.g.

RewriteCond %{HTTP_HOST} ^(123\.42\.162\.7)$

RewriteCond %1 ^123\.42\.162\.7$

RewriteRule ……………..

 

Server Variable

Syntax: % {Variable_Name}

E.g.

1. %{HTTP_HOST} – This variable gives information about server name and its IP address.

2. %{HTTP_USER_AGENT} – This variable gives information about user’s operating system and browser.

3. %{QUERY_STRING} – This variable returns query string.

4. %{HTTP_REFERER} – This variable returns the URL of the referer.

5.%{REMOTE_ADDR} -This variable returns the IP address of the referer.

 


Introduction to .htaccess file


It is an ASCII file which contains configuration directives and rules for files, folders and the whole website.

You can have more than one .htaccess file on a server.

In fact you can have one .htaccess file per folder/directory.

When you put the file in a directory, the rules mentioned in it are applicable only to all the files and sub-directories in the directory.

When you put the file in the root directory, the rules mentioned in it are applicable to all the files and directories on the server.

A htaccess file must contain following two lines:

Options +FollowSymLinks
RewriteEngine on

 


How to block referrer spam in Google Analytics via Regex and RewriteCond


Once you have identified spam referrers, block them from visiting your website again.

Since the bot visit is recorded in your server log, you can block such bots through .htaccess file (or equivalent).

Following are the various methods you can use to block referrer spam:

  1. Block the referrer used by spambot
  2. Block the IP address used by the spam bot
  3. Block the IP address range used by spam bot
  4. Block the user agents used by spambots

Method #1: Block the referrer used by spam bot

Access your .htaccess file and add the following code to block all http and https referrals from a spambot like “blackhatworth.com” and all subdomains of “blackhatworth.com“:

RewriteEngine On

Options +FollowSymlinks

RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*blackhatworth\.com\ [NC,OR]

RewriteRule .* – [F]

Create similar code to block the referrer used by other spambots.

Method #2 Block the IP address used by the spam bot

Access your .htaccess file and add a code like the one below:

RewriteEngine On

Options +FollowSymlinks

Order Deny,Allow

Deny from 234.45.12.33

Note: Do not copy paste this code into your .htaccess, it won’t work. This is just an example to show you how to block an IP address in .htaccess file. Spambots can come from many different IP addresses. So you need to keep adding IP addresses used by the spambots effecting your website.

Method #3: Block the IP address range used by spam bot

If you are sure that a particular range of IP addresses is being used by spam bots then you can block the whole IP address range like the one below:

RewriteEngine On

Options +FollowSymlinks

Deny from 76.149.24.0/24

Allow from all

Here 76.149.24.0/24 is a CIDR range.

CIDR is a method used for representing range of IP addresses.

Blocking by CIDR is more effective than blocking by individual IP addresses as it takes less space on your server.

Method #4: Block the user agents used by spam bot

Go through your server log files once in a week and find and ban malicious user agents (user agents used by spambots).

Blocked user agents can not access your website.

You can block rogue user agents like the one below:

RewriteEngine On

Options +FollowSymlinks

RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]

RewriteRule .* – [F,L]

A simple search on google can give you a big list of several websites which maintain records of known rogue user agents.

 


Other use cases of regex (Regex in SEO)


Besides Google Analytics and Google Tag Manager, regex are widely used in Search Engine Optimization (SEO).

Following are the advantages of using regex in SEO:

1. You can convert long ugly dynamic URLs into SEO friendly URLs.
2. You can apply correct redirects.
3. Prevent people from hotlinking your images
4. Block spam bots
5. Resolve canonical URL issues
6. Resolve duplicate content issues (to an extent)
7. Deliver geo specific contents based on the IP address

Example-1: Redirect all request for pages in the media folder to a new page ‘media.html’.

RewriteRule ^media/$ /media.html [r=301,l]

Example-2: Redirect oldaddress.html page to newaddress.html page

RewriteRule ^oldaddress\.html$ /newaddress.html [r=301,l]

Example-3: Redirect one website to another

Redirect 301 https://www.anotherwebsite.com

Example-4: Redirect abc.com/index.html to www.abc.com

RewriteCond %{REQUEST_URL} ^index\.html$
RewriteRule ^(.*)$ https://www.abc.com/$1 [r=301, l]

Example-5: Block a visitor from the IP address 12.34.56.78 to view your file product-prices.html

RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^product-prices\.html$ /sorry.html -[F]

Example-6: Block a visitor from the IP address 12.34.56.78 to view your folder ‘sales-demo’

RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^sales-demo/$ /sorry.html -[F]

Example-7: Block a visitor from the IP address 12.34.56.78 to view your website www.abc.com

RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$
RewriteRule ^.*$ / -[F]

Example-8: Apply 301 from one file to another file

Redirect 301  /file1.html   https://www.mywebsite.com/file2.html  

The above code will permanently redirect file1.html to file2.html. So whenever a search engine or a visitor will look for file1.html, he will automatically be redirected to file2.html.

Example-9: Convert Dynamic URL into Static Looking SEO friendly URL

RewriteCond   % {QUERY_STRING}   ^keyval\=25\&Keyval2\=62$ [nc]

RewriteRule   ^productdescription.php$  https://www.example.com/whiteboard-accessories.php? [r=301, l]

This code will redirect https://www.example.com/productdescription.php?keyval=25&keyval2=62 to https://www.example.com/whiteboard-accessories.php

Note: You need to put question mark (?) at the end of the substitution URL, otherwise query string will be appended at the end of the substitution URL.

Example-10: Redirect non-www to www

rewritecond %{http_host} ^mywebsite.com [nc]
rewriterule ^(.*)$ https://www.mywebsite.com/$1 [r=301,nc]

Note: Replace ‘mywebsite’ by your website name

Example-11: Create Custom 404 page

Create a web page which you want to display as your custom 404 page say custom404.php and then upload your webpage to the root directory. Now add following code to your .htaccess file:

Options +FollowSymLinks
RewriteEngine on
ErrorDocument 404 https://www.mywebsite.com/custom404.php

Example-12: Block an IP address from accessing your website

Add following code in your .htaccess file:

Options +FollowSymLinks
RewriteEngine on
Order Deny, Allow
Deny from 61.16.153.67

If you want to block two or more IP addresses:

Options +FollowSymLinks
RewriteEngine on
Order Deny, Allow
Deny from 61.16.153.67
Deny from 124.202.86.42

Example-13: Resolve the Hot Linking Issue

Hot-linking means direct linking to your website file (images, videos etc). By preventing hot-linking, you can save your sever bandwidth. Add following code in your .htaccess file:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^https://(.+\.)?mywebsite\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpg|jpeg|gif|bmp|png|swf)$ – [F]

Replace ‘mywebsite’ by your website name and then use hotlinking checker tool to find out whether your files (images,videos etc ) can be hot-linked or not.

Example-14: Enable proxy caching for static resources

Add following code to your .htaccess file

<FilesMatch “\.(gif|jpe?g|png)$”>
Header set Cache-Control “public”
</FilesMatch>

 

Related Tools:

  1. To learn more about regular expressions: https://www.regular-expressions.info/
  2. The Regex Coach is a graphical application for Windows which can be used to test regular expressions
  3. Regular Expression Checker – chrome add-on to text regex

Learn about the Google Analytics Usage Trends Tool

The Google Analytics usage trend is a new tool which is used to visualise trends in your Google Analytics data and to perform trend analysis.


Take your knowledge of Web Analytics to the next level. Checkout my web analytics training course.

Take your Analytics knowledge to the next level. Checkout my Best Selling Books on Amazon

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution Modelling in Google Analytics and Beyond
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Himanshu Sharma

Certified web analyst and founder of OptimizeSmart.com

My name is Himanshu Sharma and I help businesses find and fix their Google Analytics and conversion issues. If you have any questions or comments please contact me.

  • Over eleven years' experience in SEO, PPC and web analytics
  • Google Analytics certified
  • Google AdWords certified
  • Nominated for Digital Analytics Association Award for Excellence
  • Bachelors degree in Internet Science
  • Founder of OptimizeSmart.com and EventEducation.com

I am also the author of three books:

error: Alert: Content is protected !!