What I learned from trying to fix the Ghost Referrer Spam in Google Analytics

 

Failure is a great teacher, and I think when you make mistakes and you recover from them and you treat them as valuable learning experiences, then you’ve got something to share.

– Steve Harvey

This is one of those situations where I have something valuable to share, not because I succeeded but because I failed

For good 2 weeks I tried to block the fake referrer spam in a way that is scalable and that can prevent me from regularly updating my view filters.

Had thing worked out in my favour, the title of this blog post would be “How to block Ghost referrer spam in Google Analytics”. Nevertheless, I learned tons during this process.

Here is what I learned and what you will learn from this article:

#1 You will learn exactly what Ghost referrer is.

#2 You will learn how spammers generate and send fake hits.

I ended up learning this process to the point where I can actually generate and send a fake hit to any web property. But I won’t share that much details to prevent abuse. 

#3 You will learn to change the request URI in Google Analytics

# 4 You will learn something new about Google Analytics view filters.

Whenever a user clicks a link on a website to go to your website, user’s web browser send a HTTP request to your web server. This HTTP request is made up of a request line and request headers:

HTTP Request Headers

One of the field of request header is ‘referer’. Referer contains the URL of the last web page a user was on before visiting your website.

Google Analytics tracking code capture this referrer data and send it to your web property. That’s how you can see referrer data in your GA reports.

However it is also possible to send referrer data directly to Google Analytics via measurement protocol. In this case Google Analytics can record a referrer even when no referrer is sent from a web browser. Such type of referrer is known as ghost referrer.

It is ghost referrer for me and you because it does not actually referring (sending) any traffic to your website. But it is not a ghost referrer for Google Analytics. For Google Analytics it is just another referrer.

Through Google Analytics measurement protocol it is possible to fake any hit (pageview, screenview, event etc) and not just referrer. All you need is the Tracking ID of the website where you want to send the fake hit.

Following example demonstrates how to format the payload data to send pageview hit to Google Analytics via measurement protocol:

v=1 // Version.
&tid=UA-XXXX-Y // Tracking ID / Property ID.
&cid=555 // Anonymous Client ID.

&t=pageview // Pageview hit type.
&dh=mydemo.com // Document hostname.
&dp=/home // Page.
&dt=homepage // Title.

Payload data is the data you send to Google Analytics server using the measurement protocol. Following are some of the parameters of Payload data:

  1. v
  2. &tid
  3. &cid
  4. &t
  5. &dh
  6. &dp
  7. &dt

All of the parameters of Payload data (excluding the version and the tracking ID) can be faked. What that means, you can provide any value to these parameters.

So for example, following payload data is perfectly valid:

v=1 // Version.
&tid=UA-12344-1 // Tracking ID / Property ID.
&cid=45438595854645 // Randomly generated Client ID.

&t=pageview // Pageview hit type.
&dh=igotyou.com // Document hostname.
&dp=/youCantCatchMe // Page.
&dt=Your%20Worst%20Nightmare // Title.

Here, client ID, host name, request URI (Page) and page title all have been faked.

Similarly, following payload data (though perfectly valid) can be used to send fake event data to Google Analytics:

v=1 // Version.
&tid=UA-12344-1 // Tracking ID / Property ID.
&cid=45438595854645 // Randomly generated Client ID.

&t=event // Event hit type
&ec=igotyou.com // Event Category. Required.
&ea=play // Event Action. Required.

 

By making an HTTP POST request to www.google-analytics.com, you can send this payload data to whoever is in charge of the web property ‘UA-12344-1’.

It is believed that spammers gain access to tracking ID by two methods:

#1 Through Spam bots which crawl websites and scrape tracking IDs.

#2 By randomly generating property IDs and targeting websites randomly.

According to my understanding, the tracking IDs once scraped are stored by spammers and are then repeatedly targeted on a regular basis to propagate referrer spam.

Since majority of payload data can be faked, spammers can very easily change any of the payload parameter to escape view filters. So unless you don’t update your filters on regular basis, you can’t get rid of ghost hits like fake referrers.

Before I move forward, I want to be sure that you understand the difference between Ghost referrer spam and spam traffic from spam bots.

Unlike Ghost referrers, spam bots (like semalt.com) crawl your website in order to send fake referrer headers and can be blocked via ‘htaccess’ file.

In this article I will only talk about ghost referrer spam. If you want to learn about referrer spam generated by spam bots then read this article: Geek guide to removing referrer spam in Google Analytics

I have created a custom report through which you can detect almost all referral spam (whether spambot or ghost referrer) on your website. 

You can download this report from Google Analytics Solutions Gallery here

 

Encoded request URI method

To block ghost referrers, I encoded all the request URIs of my website. Here is, how it was supposed to work:

#1 A web page reports an encoded URI instead of the actual URI

#2 Encoded URI is sent to Google Analytics

#3 Only request URIs which contains the security key are allowed in Google Analytics reports.

#4 Decode Request URI within Google Analytics

 

#1 A web page reports an encoded URI instead of the actual URI

For example if someone visits my home page, the request URI is no longer / but something like this sdfsdfjdrwrwe90424/

This request URI is made up of two components:

#1 Security key which can be any string value. For example sdfsdfjdrwrwe90424

#2 Path name of the current URL. For example /

Similarly, if someone visit the ‘contact’ page on my website, the request URI would be something like sdfsdfjdrwrwe90424/contact/ and not /contact/

The objective of encoding my URIs was that spammers can no longer send ghost hits to my website via measurement protocol.

 

#2 Encoded URI is sent to Google Analytics

When someone visit the home page of a website the URI sent to Google Analytics is usually /

But when you encode the URIs then Google Analytics start getting the encoded URIs. So visit to a home page is no longer reported as:

home page visit

but it is reported something like this:

home page visit2

 

#3 Only request URIs which contains the security key are allowed in Google Analytics reports.

If a request URI doesn’t contain the security key the traffic from that URI is excluded in Google Analytics reports via view filter.

Spammers in general know nothing about your website structure (unless they are specifically targeting you). That’s is why majority of ghost referrers are reported for the home page of the website and the fake request URI is usually /:

ghost referrers

Since my view filter allows only those request URI which contains the security key, all request URIs which don’t contain the security key like / will automatically be excluded from the reports.

Spammers can also send fake hits to non-existing web pages on my website but that would also not work as their request URIs still wont’ contain the security key and will be excluded from my reports.

Even if they somehow find out my security key, I can change it any time. The only way to generate the encoded request URI is to actually visit/crawl my website. Since ghost referrers don’t actually visit a website, they all would be excluded from my reports.

If they do crawl my website then they are not ghost referrers but spam bots and I can then block them through htaccess file.

 

#4 Decode Request URI within Google Analytics

Decoding a request URI means removing the security key from request URI.

Decoding is required in order to understand GA reports.  If my report is full of encoded request URIs then it would become difficult to interpret the data and my reports would look something like the one below:

dont make sense

So I need to remove these security keys. I can do that by simple search and replace view filter.

 

I followed the steps below to implement this concept:

#1 Modify the Google Analytics tracking code on my website so that it start sending encoded URIs

<script>
(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);

ga(‘create’, ‘UA-30449797-1’, ‘auto’);
ga(‘send’, ‘pageview’,’<enter your security key>’ + location.pathname);

</script>

Choose any long alphanumeric number as your security key and make a note of it.

So for example if you choose dhffjsrr12353fdf4253kc as security key then, your Google Analytics tracking code would look like the one below:

<script>
(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);

ga(‘create’, ‘UA-30449797-1’, ‘auto’);
ga(‘send’, ‘pageview’,’dhffjsrr12353fdf4253kc‘ + location.pathname);

</script>

The locations.pathname returns the path name of the current URL and what I am doing here is appending the security key to the path name.

That’s how I can modify the request URI.

 

#2 Create a view filter which only include pages with security key

I created a custom include filter with following configuration:

custom include filter

Here the filter pattern is a regular expression which checks for a pattern which starts with security key.

 

#3 Create a search and replace filter which remove the security key from the request URI

I created a search and replace filter with following configuration:

search and replace filter

 

What actually happened and what I learned

Whenever the request URI is generated via measurement protocol, Google Analytics just couldn’t block the request URI whether or not it contains the security key. In other words, when you send a request URI via measurement protocol, Google Analytics filters based on request URI just don’t work.

Even if you block the request URI of the home page (/) which is frequently targeted by spammers by custom exclude filter, it works only in excluding all of the traffic going to the home page. It still can’t block the fake / request URI sent via measurement protocol.

Spammers don’t send equal amount of fake hits to every web property. Some web properties get more fake hits than other. Spammers also don’t send equal amount of fake hits every day. Fake hits increase or decrease depending upon the day of a week. For example, spammers seem to be less active during weekends, like they take a break or something which also indicates that the script which is used to abuse measurement protocol to send fake hits is run manually.

Until Google make the measurement protocol more secure, the only effective way to block ghost referrers is by including only the traffic from valid host names using the custom include filter, as described in this article: Geek guide to removing referrer spam in Google Analytics

Related Article: Tracking true referrals in Google Analytics when using PayPal and other payment gateways

Announcement about my books

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution Modelling in Google Analytics and Beyond
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Himanshu Sharma

Certified web analyst and founder of OptimizeSmart.com

My name is Himanshu Sharma and I help businesses find and fix their Google Analytics and conversion issues. If you have any questions or comments please contact me.

  • Over eleven years' experience in SEO, PPC and web analytics
  • Google Analytics certified
  • Google AdWords certified
  • Nominated for Digital Analytics Association Award for Excellence
  • Bachelors degree in Internet Science
  • Founder of OptimizeSmart.com and EventEducation.com

I am also the author of three books:

error: Content is protected !!