What I learned from trying to fix ghost referrer spam in Google Analytics

Last Updated: August 26, 2021

Failure is a great teacher, and I think when you make mistakes and you recover from them and you treat them as valuable learning experiences, then you’ve got something to share.

– Steve Harvey

This is one of those situations where I have something valuable to share, not because I succeeded, but because I failed. For a good two weeks, I tried to block the fake referrer spam in a way that is scalable and that prevents me from regularly updating my view filters.

Had anything worked out in my favour, the title of this blog post would be “How to block ghost referrer spam in Google Analytics”. Nevertheless, I learned tons during this process.

Here is what I learned and what you will learn from this article:

  1. You will learn exactly what ghost referrer is.
  2. You will learn how spammers generate and send fake hits.
    I ended up learning this process to the point where I can actually generate and send a fake hit to any web property. But I won’t share that much detail to prevent abuse.
  3. You will learn to change the request URI in Google Analytics.
  4. You will learn something new about Google Analytics view filters.

Whenever a user clicks a link on a website to go to your website, the user’s web browser sends an HTTP request to your web server. This HTTP request is made up of a request line and request headers:

HTTP Request Headers

One of the fields of the request header is ‘referer’. Referer contains the URL of the last web page a user was on before visiting your website. The Google Analytics tracking code captures this referrer data and sends it to your web property. That’s how you can see referrer data in your GA reports.

62 point checklist

Get the E-Book (50 Pages)

Google Analytics 4 thumb

Get the FREE E-Book (50+ Pages)

However it is also possible to send referrer data directly to Google Analytics via measurement protocol. In this case Google Analytics can record a referrer even when no referrer is sent from a web browser. Such type of referrer is known as a ghost referrer.

It is a ghost referrer for me and you because it is not actually referring (sending) any traffic to your website. But it is not a ghost referrer for Google Analytics. For Google Analytics it is just another referrer.

Through Google Analytics measurement protocol it is possible to fake any hit (pageview, screenview, event, etc) and not just referrer. All you need is the tracking ID of the website where you want to send the fake hit.

Following example demonstrates how to format the payload data to send pageview hit to Google Analytics via measurement protocol:

v=1 // Version.
&tid=UA-XXXX-Y // Tracking ID / Property ID.
&cid=555 // Anonymous Client ID.

&t=pageview // Pageview hit type.
&dh=mydemo.com // Document hostname.
&dp=/home // Page.
&dt=homepage // Title.

Payload data is the data you send to the Google Analytics server using the measurement protocol.

Following are some of the parameters of payload data:

  1. v
  2. &tid
  3. &cid
  4. &t
  5. &dh
  6. &dp
  7. &dt

All of the parameters of payload data (excluding the version and the tracking ID) can be faked. What that means, you can provide any value to these parameters.

For example, the following payload data is perfectly valid:

v=1 // Version.
&tid=UA-12344-1 // Tracking ID / Property ID.
&cid=45438595854645 // Randomly generated Client ID.

&t=pageview // Pageview hit type.
&dh=igotyou.com // Document hostname.
&dp=/youCantCatchMe // Page.
&dt=Your%20Worst%20Nightmare // Title.

Here, client ID, hostname, request URI (Page), and page title all have been faked.

Similarly, following payload data (though perfectly valid) can be used to send fake event data to Google Analytics:

v=1 // Version.
&tid=UA-12344-1 // Tracking ID / Property ID.
&cid=45438595854645 // Randomly generated Client ID.

&t=event // Event hit type
&ec=igotyou.com // Event Category. Required.
&ea=play // Event Action. Required.

By making an HTTP POST request to www.google-analytics.com, you can send this payload data to whoever is in charge of the web property ‘UA-12344-1’.

It is believed that spammers gain access to tracking ID by two methods:

  1. Through spambots that crawl websites and scrape tracking IDs.
  2. By randomly generating property IDs and targeting websites randomly.

According to my understanding, the tracking IDs once scraped are stored by spammers and are then repeatedly targeted on a regular basis to propagate referrer spam.

Since the majority of payload data can be faked, spammers can very easily change any of the payload parameters to escape view filters. So unless you update your filters on a regular basis, you can’t get rid of ghost hits like fake referrers.

Before I move forward, I want to be sure that you understand the difference between ghost referrer spam and spam traffic from spambots.

Unlike ghost referrers, spambots (like .com) crawl your website in order to send fake referrer headers and can be blocked via an ‘htaccess’ file.

In this article, I will only talk about ghost referrer spam. If you want to learn about referrer spam generated by spambots then read this article: Geek guide to removing referrer spam in Google Analytics

I have created a custom report through which you can detect almost all referral spam (whether spambot or ghost referrer) on your website.

You can download this report from the Google Analytics Solutions Gallery

Encoded request URI method

To block ghost referrers, I encoded all the request URIs of my website. Here is, how it was supposed to work:

#1 A web page reports an encoded URI instead of the actual URI

#2 Encoded URI is sent to Google Analytics

#3 Only request URIs which contains the security key are allowed in Google Analytics reports.

#4 Decode request URI within Google Analytics

#1 A web page reports an encoded URI instead of the actual URI

For example, if someone visits my home page, the request URI is no longer / but something like this sdfsdfjdrwrwe90424/

This request URI is made up of two components:

#1 Security key which can be any string value. For example sdfsdfjdrwrwe90424

#2 Path name of the current URL. For example /

Similarly, if someone visits the contact page on my website, the request URI would be something like sdfsdfjdrwrwe90424/contact/ and not /contact/

The objective of encoding my URIs was that spammers can no longer send ghost hits to my website via measurement protocol.

#2 Encoded URI is sent to Google Analytics

When someone visits the home page of a website the URI sent to Google Analytics is usually /

But when you encode the URIs then Google Analytics starts getting the encoded URIs. So a visit to the home page is no longer reported as:

home page visit

but it is reported something like this:

home page visit2

#3 Only request URIs which contain the security key is allowed in Google Analytics reports.

If a request URI doesn’t contain the security key the traffic from that URI is excluded in Google Analytics reports via view filter.

Spammers, in general, know nothing about your website structure (unless they are specifically targeting you). That’s is why the majority of ghost referrers are reported for the home page of the website and the fake request URI is usually /:

ghost referrers

Since my view filter allows only those request URI that contain the security key, all request URIs that don’t contain the security key like / will automatically be excluded from the reports.

Spammers can also send fake hits to non-existing web pages on my website but that would also not work as their request URIs still won’t’ contain the security key and will be excluded from my reports.

Even if they somehow find out my security key, I can change it at any time. The only way to generate the encoded request URI is to actually visit/crawl my website. Since ghost referrers don’t actually visit a website, they all would be excluded from my reports.

If they do crawl my website, then they are not ghost referrers, but spambots, and I can then block them through the htaccess file.

#4 Decode Request URI within Google Analytics

Decoding a request URI means removing the security key from request URI.

Decoding is required in order to understand GA reports. If my report is full of encoded request URIs then it would become difficult to interpret the data and my reports would look something like the one below:

dont make sense

So I need to remove these security keys. I can do that by a simple search and replace view filter.

I followed the steps below to implement this concept:

#1 Modify the Google Analytics tracking code on my website so that it starts sending encoded URIs

<script>
(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);

ga(‘create’, ‘UA-30449797-1’, ‘auto’);
ga(‘send’, ‘pageview’,’<enter your security key>’ + location.pathname);

</script>

Choose any long alphanumeric number as your security key and make a note of it.

So for example if you choose dhffjsrr12353fdf4253kc as security key then, your Google Analytics tracking code would look like the one below:

<script>
(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);

ga(‘create’, ‘UA-30449797-1’, ‘auto’);
ga(‘send’, ‘pageview’,’dhffjsrr12353fdf4253kc‘ + location.pathname);

</script>

The locations.pathname returns the pathname of the current URL and what I am doing here is appending the security key to the pathname.

That’s how I can modify the request URI.

#2 Create a view filter which only includes pages with security keys

I created a custom include filter with the following configuration:

custom include filter

Here the filter pattern is a regular expression that checks for a pattern that starts with security key.

#3 Create a search and replace filter which removes the security key from the request URI

I created a search and replace filter with the following configuration:

search and replace filter

What actually happened and what I learned

Whenever the request URI is generated via measurement protocol, Google Analytics just couldn’t block the request URI whether or not it contains the security key. In other words, when you send a request URI via measurement protocol, Google Analytics filters based on request URI just don’t work.

Even if you block the request URI of the home page (/) which is frequently targeted by spammers by custom exclude filter, it works only in excluding all of the traffic going to the home page. It still can’t block the fake/request URI sent via measurement protocol.

Spammers don’t send equal amount of fake hits to every web property. Some web properties get more fake hits than others. Spammers also don’t send an equal amount of fake hits every day. Fake hits increase or decrease depending upon the day of a week. For example, spammers seem to be less active during weekends, like they take a break or something, which also indicates that the script which is used to abuse measurement protocol to send fake hits is run manually.

Until Google makes the measurement protocol more secure, the only effective way to block ghost referrers is by including only the traffic from valid hostnames using the custom include filter, as described in this article: Geek guide to removing referrer spam in Google Analytics

Related Article: Tracking true referrals in Google Analytics when using PayPal and other payment gateways

Register for the FREE TRAINING...

"How to use Digital Analytics to generate floods of new Sales and Customers without spending years figuring everything out on your own."



Here’s what we’re going to cover in this training…

#1 Why digital analytics is the key to online business success.

​#2 The number 1 reason why most marketers are not able to scale their advertising and maximize sales.

#3 Why Google and Facebook ads don’t work for most businesses & how to make them work.

#4 ​Why you won’t get any competitive advantage in the marketplace just by knowing Google Analytics.

#5 The number 1 reason why conversion optimization is not working for your business.

#6 How to advertise on any marketing platform for FREE with an unlimited budget.

​#7 How to learn and master digital analytics and conversion optimization in record time.



   

My best selling books on Digital Analytics and Conversion Optimization

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution Modelling in Google Analytics and BeyondSECOND EDITION OUT NOW!
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.

About the Author

Himanshu Sharma

  • Founder, OptimizeSmart.com
  • Over 15 years of experience in digital analytics and marketing
  • Author of four best-selling books on digital analytics and conversion optimization
  • Nominated for Digital Analytics Association Awards for Excellence
  • Runs one of the most popular blogs in the world on digital analytics
  • Consultant to countless small and big businesses over the decade

Learn and Master Google Analytics 4 (GA4) - 126 pages ebook

X
error: Alert: Content is protected !!