What I learned from trying to fix the Ghost Referrer Spam in Google Analytics

Failure is a great teacher, and I think when you make mistakes and you recover from them and you treat them as valuable learning experiences, then you’ve got something to share.

– Steve Harvey

This is one of those situations where I have something valuable to share, not because I succeeded but because I failed

For good 2 weeks I tried to block the fake referrer spam in a way that is scalable and that can prevent me from regularly updating my view filters.

Had thing worked out in my favour, the title of this blog post would be “How to block Ghost referrer spam in Google Analytics”.

Nevertheless, I learned tons during this process.

Here is what I learned and what you will learn from this article:

#1 You will learn exactly what Ghost referrer is.

#2 You will learn how spammers generate and send fake hits.

I ended up learning this process to the point where I can actually generate and send a fake hit to any web property. But I won’t share that much details to prevent abuse.

#3 You will learn to change the request URI in Google Analytics

# 4 You will learn something new about Google Analytics view filters.

Whenever a user clicks a link on a website to go to your website, user’s web browser send a HTTP request to your web server.

This HTTP request is made up of a request line and request headers:

HTTP Request Headers

One of the field of request header is ‘referer’. Referer contains the URL of the last web page a user was on before visiting your website.

Google Analytics tracking code capture this referrer data and send it to your web property.

That’s how you can see referrer data in your GA reports.

Get the E-Book (37 Pages)

However it is also possible to send referrer data directly to Google Analytics via measurement protocol.

In this case Google Analytics can record a referrer even when no referrer is sent from a web browser.

Such type of referrer is known as ghost referrer.

It is ghost referrer for me and you because it does not actually referring (sending) any traffic to your website.

But it is not a ghost referrer for Google Analytics.

For Google Analytics it is just another referrer.

Through Google Analytics measurement protocol it is possible to fake any hit (pageview, screenview, event etc) and not just referrer.

All you need is the Tracking ID of the website where you want to send the fake hit.

Following example demonstrates how to format the payload data to send pageview hit to Google Analytics via measurement protocol:

v=1 // Version.
&tid=UA-XXXX-Y // Tracking ID / Property ID.
&cid=555 // Anonymous Client ID.

&t=pageview // Pageview hit type.
&dh=mydemo.com // Document hostname.
&dp=/home // Page.
&dt=homepage // Title.

Payload data is the data you send to Google Analytics server using the measurement protocol.

Following are some of the parameters of Payload data:

  1. v
  2. &tid
  3. &cid
  4. &t
  5. &dh
  6. &dp
  7. &dt

All of the parameters of Payload data (excluding the version and the tracking ID) can be faked.

What that means, you can provide any value to these parameters.

So for example, following payload data is perfectly valid:

v=1 // Version.
&tid=UA-12344-1 // Tracking ID / Property ID.
&cid=45438595854645 // Randomly generated Client ID.

&t=pageview // Pageview hit type.
&dh=igotyou.com // Document hostname.
&dp=/youCantCatchMe // Page.
&dt=Your%20Worst%20Nightmare // Title.

Here, client ID, host name, request URI (Page) and page title all have been faked.

Similarly, following payload data (though perfectly valid) can be used to send fake event data to Google Analytics:

v=1 // Version.
&tid=UA-12344-1 // Tracking ID / Property ID.
&cid=45438595854645 // Randomly generated Client ID.

&t=event // Event hit type
&ec=igotyou.com // Event Category. Required.
&ea=play // Event Action. Required.

By making an HTTP POST request to www.google-analytics.com, you can send this payload data to whoever is in charge of the web property ‘UA-12344-1’.

It is believed that spammers gain access to tracking ID by two methods:

#1 Through Spam bots which crawl websites and scrape tracking IDs.

#2 By randomly generating property IDs and targeting websites randomly.

According to my understanding, the tracking IDs once scraped are stored by spammers and are then repeatedly targeted on a regular basis to propagate referrer spam.

Since majority of payload data can be faked, spammers can very easily change any of the payload parameter to escape view filters. So unless you don’t update your filters on regular basis, you can’t get rid of ghost hits like fake referrers.

Before I move forward, I want to be sure that you understand the difference between Ghost referrer spam and spam traffic from spam bots.

Unlike Ghost referrers, spam bots (like .com) crawl your website in order to send fake referrer headers and can be blocked via ‘htaccess’ file.

In this article I will only talk about ghost referrer spam. If you want to learn about referrer spam generated by spam bots then read this article: Geek guide to removing referrer spam in Google Analytics

I have created a custom report through which you can detect almost all referral spam (whether spambot or ghost referrer) on your website.

You can download this report from Google Analytics Solutions Gallery here

 

Encoded request URI method

To block ghost referrers, I encoded all the request URIs of my website. Here is, how it was supposed to work:

#1 A web page reports an encoded URI instead of the actual URI

#2 Encoded URI is sent to Google Analytics

#3 Only request URIs which contains the security key are allowed in Google Analytics reports.

#4 Decode Request URI within Google Analytics

 

#1 A web page reports an encoded URI instead of the actual URI

For example if someone visits my home page, the request URI is no longer / but something like this sdfsdfjdrwrwe90424/

This request URI is made up of two components:

#1 Security key which can be any string value. For example sdfsdfjdrwrwe90424

#2 Path name of the current URL. For example /

Similarly, if someone visit the ‘contact’ page on my website, the request URI would be something like sdfsdfjdrwrwe90424/contact/ and not /contact/

The objective of encoding my URIs was that spammers can no longer send ghost hits to my website via measurement protocol.

 

#2 Encoded URI is sent to Google Analytics

When someone visit the home page of a website the URI sent to Google Analytics is usually /

But when you encode the URIs then Google Analytics start getting the encoded URIs. So visit to a home page is no longer reported as:

home page visit

but it is reported something like this:

home page visit2

 

#3 Only request URIs which contains the security key are allowed in Google Analytics reports.

If a request URI doesn’t contain the security key the traffic from that URI is excluded in Google Analytics reports via view filter.

Spammers in general know nothing about your website structure (unless they are specifically targeting you). That’s is why majority of ghost referrers are reported for the home page of the website and the fake request URI is usually /:

ghost referrers

Since my view filter allows only those request URI which contains the security key, all request URIs which don’t contain the security key like / will automatically be excluded from the reports.

Spammers can also send fake hits to non-existing web pages on my website but that would also not work as their request URIs still wont’ contain the security key and will be excluded from my reports.

Even if they somehow find out my security key, I can change it any time. The only way to generate the encoded request URI is to actually visit/crawl my website. Since ghost referrers don’t actually visit a website, they all would be excluded from my reports.

If they do crawl my website then they are not ghost referrers but spam bots and I can then block them through htaccess file.

 

#4 Decode Request URI within Google Analytics

Decoding a request URI means removing the security key from request URI.

Decoding is required in order to understand GA reports. If my report is full of encoded request URIs then it would become difficult to interpret the data and my reports would look something like the one below:

dont make sense

So I need to remove these security keys. I can do that by simple search and replace view filter.

 

I followed the steps below to implement this concept:

#1 Modify the Google Analytics tracking code on my website so that it start sending encoded URIs

<script>
(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);

ga(‘create’, ‘UA-30449797-1’, ‘auto’);
ga(‘send’, ‘pageview’,’<enter your security key>’ + location.pathname);

</script>

Choose any long alphanumeric number as your security key and make a note of it.

So for example if you choose dhffjsrr12353fdf4253kc as security key then, your Google Analytics tracking code would look like the one below:

<script>
(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);

ga(‘create’, ‘UA-30449797-1’, ‘auto’);
ga(‘send’, ‘pageview’,’dhffjsrr12353fdf4253kc‘ + location.pathname);

</script>

The locations.pathname returns the path name of the current URL and what I am doing here is appending the security key to the path name.

That’s how I can modify the request URI.

 

#2 Create a view filter which only include pages with security key

I created a custom include filter with following configuration:

custom include filter

Here the filter pattern is a regular expression which checks for a pattern which starts with security key.

 

#3 Create a search and replace filter which remove the security key from the request URI

I created a search and replace filter with following configuration:

search and replace filter

 

What actually happened and what I learned

Whenever the request URI is generated via measurement protocol, Google Analytics just couldn’t block the request URI whether or not it contains the security key. In other words, when you send a request URI via measurement protocol, Google Analytics filters based on request URI just don’t work.

Even if you block the request URI of the home page (/) which is frequently targeted by spammers by custom exclude filter, it works only in excluding all of the traffic going to the home page. It still can’t block the fake / request URI sent via measurement protocol.

Spammers don’t send equal amount of fake hits to every web property. Some web properties get more fake hits than other. Spammers also don’t send equal amount of fake hits every day. Fake hits increase or decrease depending upon the day of a week. For example, spammers seem to be less active during weekends, like they take a break or something which also indicates that the script which is used to abuse measurement protocol to send fake hits is run manually.

Until Google make the measurement protocol more secure, the only effective way to block ghost referrers is by including only the traffic from valid host names using the custom include filter, as described in this article: Geek guide to removing referrer spam in Google Analytics

Related Article: Tracking true referrals in Google Analytics when using PayPal and other payment gateways

 

Do you know the difference between Digital Analytics and Google Analytics?


99.99% of course creators themselves don’t know the difference between Digital analytics, Google Analytics (GA) and Google Tag Manager (GTM).

So they are teaching GA and GTM in the name of teaching Digital analytics.

They just copy each other. Monkey see, monkey do.

But Digital analytics is not about GA, GTM.

It is about analyzing and interpreting data, setting up goals, strategies and KPIs.

It’s about creating strategic roadmap for your business.


Digital Analytics is the core skill. Google Analytics is just a tool used to implement ‘Digital Analytics’.

You can also implement ‘Digital analytics’ via other tools like ‘adobe analytics’, ‘kissmetrics’ etc.

Using Google Analytics without the good understanding of ‘Digital analytics’ is like driving around in a car, in a big city without understanding the traffic rules and road signs.

You are either likely to end up somewhere other than your destination or you get involved in an accident.


You learn data analysis and interpretation from Digital analytics and not from Google Analytics.

The direction in which your analysis will move, will determine the direction in which your marketing campaigns and eventually your company will move to get the highest possible return on investment.

You get that direction from ‘Digital analytics’ and not from ‘Google Analytics’.


You learn to set up KPIs, strategies and measurement framework for your business from ‘Digital analytics’ and not from ‘Google Analytics’.

So if you are taking a course only on 'Google Analytics’, you are learning to use one of the tools of ‘Digital analytics’. You are not learning the ‘Digital analytics’ itself.

Since any person can learn to use Google Analytics in couple of weeks, you do no get any competitive advantage in the marketplace just by knowing GA.

You need to know lot more than GA in order to work in digital analytics and marketing field.


So what I have done, if you are interested, is I have put together a completely free training that will teach you exactly how I have been able to leverage digital analytics to generate floods of news sales and customers and how you can literally copy what I have done to get similar results.

Here what You'll Learn On This FREE Web Class!


1) The number 1 reason why most marketers and business owners are not able to scale their advertising and maximise sales.

2) Why you won’t get any competitive advantage in the marketplace just by knowing Google Analytics.

3) The number 1 reason why conversion optimization is not working for your business.

4) How to advertise on any marketing platform for FREE with an unlimited budget.

5) How to learn and master digital analytics in record time.

 
 

My best selling books on Digital Analytics and Conversion Optimization

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution Modelling in Google Analytics and Beyond
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.

Himanshu Sharma

Digital Marketing Consultant and Founder of Optimizesmart.com

Himanshu helps business owners and marketing professionals in generating more sales and ROI by fixing their website tracking issues, helping them understand their true customers purchase journey and helping them determine the most effective marketing channels for investment.

He has over 12 years experience in digital analytics and digital marketing.

He was nominated for the Digital Analytics Association's Awards for Excellence.

The Digital Analytics Association is a world renowned not-for-profit association which helps organisations overcome the challenges of data acquisition and application.

He is the author of four best-selling books on analytics and conversion optimization:

error: Alert: Content is protected !!