Guide to removing referrer spam and fake traffic in Google Analytics

If your Google Analytics is getting referrer spam, ghost spam or any other type of fake traffic or you would like to know whether you are getting such type of traffic then this article is for you.

In this article I will show you how to minimise or even completely eliminate the negative impact of fake traffic on your GA reports.

Introduction to fake traffic

In the context of Google Analytics, fake traffic is defined as one or more fake hits sent to your GA property. A ‘hit’ is that user interaction with your website that result in data being sent to your Google Analytics property. A hit can be a ‘pageview’, ‘screenview’, ‘event’, ‘transaction’ etc.

A fake hit is the one which is generated by a program or a bot instead of as a result of a living breathing human being who interacted with your website. At present it is possible to fake any GA hit. What that means, spammer can send fake referral traffic, fake organic traffic, fake direct traffic, fake traffic from social media etc. Spammer can fake events, virtual pageviews, screenviews, hostname, request URI, keywords and even transaction and item data:

event-spam

 

landing-page-spam

 

referrer-spam2

 

fake-organic-traffic

 

keywords-spam

 

trump-spam-google-analytics

With adequate knowledge of the measurement protocol, it is possible to inflate, deflate or completely delete all of the sales data from any GA property.

A spammer/hacker just needs your GA property ID to do his dirty magic. He can then practically rewrite your analytics data from any location around the world without any GA account access. This is a big data security risk which many people are not aware of. Even using the premium version of Google Analytics does not protect from you being hacked/spammed.

 

Who could possibly benefit from sending fake traffic?

Affiliates are most likely to benefit from sending fake traffic as they get commission. Internet marketers (particularly SEOs) can also benefit from sending fake traffic.

It is not very hard to artificially inflate organic search traffic in GA and then boast about one’s marketing efforts in front of client/boss.

In fact any person who can benefit financially, in any shape or form, by sending fake traffic can send fake hits to your GA account. Of late fake GA hits were also used to promote propaganda. This was in the form of language spam to vote for Donald Trump in the US election.

 

It is all about ‘bots’

A bot is a program which is developed to perform repetitive tasks with high degree of accuracy and speed. Bots are generally used for web indexing (indexing the contents of websites). But they are also widely used for malicious purposes like:

  1. To commit click fraud (for increasing advertising revenue or depleting competitors’ advertising budget )
  2. Harvest email addresses (for mass spamming)
  3. Create fake user accounts
  4. Submit comments for spamming purpose.
  5. Scrape website contents (for creating spam website to host adsense ads)
  6. Spread malware (for advertising and getting ransom from webmasters)
  7. Scrape Google Analytics Ids for sending fake traffic
  8. Send fake website traffic etc.

Thus depending upon a how a bot is used, we can have a good bot and we can have a spam bot. Example of a good bot is ‘googlebot’ which is used by Google to crawl and index web pages on the internet. Good bots obey robots.txt directive but spam bots don’t. Spam bots can use various methods to disguise themselves, so that they can’t be easily detected by any security measure. They can pretend to be a web browser (like chrome, internet explorer etc). They can pretend to be traffic coming from a legitimate website.

Not all spam bots are developed to send fake traffic to Google Analytics. But whether or not they skew your analytics data, they can still eat your website bandwidth and can negatively affect your website performance. In a worst case scenario they can be used to hack your website or install your website with malware. In the context of Google Analytics, there are two types of spam bots:

  1. Spam bots which visit websites
  2. Spam bots which do not visit websites

 

Spam bots which visit websites (first generation bots)

These bots actually visit websites in order to send fake traffic (mainly fake referral traffic). These bots can crawl hundreds and thousands of websites every day and send out HTTP requests to the websites with fake referrer header. They create and send fake referrer headers to avoid being detected as bots. The fake referrer header contains the website URL which spammer wants to promote and/or build back links. For example, spam bots may use ‘bbc.co.uk’ as a fake referrer. Because the BBC is a legitimate website, when you see that referrer in your report you won’t even think twice that the traffic coming from the website could be fake and that no one actually visited your website from BBC.

When your website receives an HTTP request from a spam bot with fake referrer header, it is immediately recorded in your server log. Many SEOs use such spam bots for link building purpose. They spam under the belief that if a server log is publicly accessible (i.e. it can be crawled and indexed by Google) then Google treat the referrer value in the server log as a backlink thus positively influencing the search engine ranking of the website being promoted. But I am confident that Google is smart enough to detect what it is crawling is a log file and not a real web page and thus devaluing all backlinks from server logs.

These spambots have the ability to execute javascript and are thus able to avoid bot filtering methods used by Google Analytics. Because of this ability, you can see traffic from such spambots in your Google Analytics ‘Referrals’ reports. For examples, bots from semalt and buttons-for-website actually visit your website and send out HTTP requests to the websites with fake referrer header.

 

How to find referrer spam in Google Analytics

Follow the steps below to detect and fix referrer spam:

Step-1: Navigate to ‘Referrals’ report in your GA view.

Step-2: Change the date range of  the ‘Referrals’ report to the last two months.

Step-3: Sort the report by bounce rate in descending order or you can use the following regex (not foolproof) to filter out all the spam referrers in the ‘Referrals’ report:

semalt|button|ilovevitaly|darodar|hulfingtonpost|ranksonic|[0-9]{1,3}\.[0-9]{1,3}|website|[0-9][a-z]|free|click|blackhatworth|makemoneyonline|priceg|best-seo-offer|familyfocusblog|traffic|anal-acrobats|buy-cheap-online|deximedia|webmaster|link|event-tracking|discover-results|fwdservice|pornhub-forum

regex

Step-4: Look for referrers with 100% or 0% bounce rate and 10 or more sessions. They are most likely spam referrers.

Note: Exhaustive list of spam referrers can be found here: https://perishablepress.com/blacklist/ultimate-referrer-blacklist.txt

Step-5: If you can not confirm the identity of a suspicious looking referrer, then you need to take the risk and visit the website to make sure whether or not it is a legitimate website and it is actually linking out to your website. Make sure that you have anti virus/ anti malware software installed on your website before you visit such websites as they may infect your machine as soon as you visit them.

Use Google Chrome web browser to visit suspicious looking websites. Chrome detect ‘malware deploying websites’ faster than any other web browser and malware scanner I know. So if you use chrome, your machine is less likely to get infected when you visit any suspicious looking website listed in your GA ‘Referrals’ reports.

Step-6: Make a note of all of the spam referrers whose traffic you want to block from your Google Analytics view:

referrer-spam

Step-7Convert the list of your spam referrers into regular expressions. For example if following is the list of the spam referrers you discovered:

  • semalt.com
  • semalt.semalt.com
  • buttons-for-website.com
  • blackhatworth.com
  • 7makemoneyonline.com

then the corresponding regex could be:

semalt|buttons|blackhatworth|7makemoneyonline(\.com)+

We will later use this regex while setting up GA view filter. Create a regex which can correctly identify all of the spam referrers whose traffic you want to exclude in your GA view.

If you are brand new to regular expressions then read this article: Regular Expressions Guide for SEO, Google Analytics & Google Tag Manager

 

How to block referrer spam

Once you have identified spam referrers, block them ASAP from visiting your website again. Since the bot visit is recorded in your server log, you can block such bots through .htaccess file (or equivalent).

Following are the various methods you can use to block referrer spam:

  1. Block the referrer used by spambot
  2. Block the IP address used by the spam bot
  3. Block the IP address range used by spam bot
  4. Block the user agents used by spambots
  5. Block spam referrer through custom advanced filter in GA (only if you can not get server access)
  6. Use the Google Analytics ‘Bot filtering’ feature.

 

Method #1: Block the referrer used by spam bot

Access your .htaccess file and add the following code to block all http and https referrals from semalt.com and all subdomains of “semalt.com“:

RewriteEngine On

Options +FollowSymlinks

RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]

RewriteRule .* – [F]

Create similar code to block the referrer used by other spambots.

Note: Not all spam bots send referrer headers. In that case traffic from such bots won’t appear as referral traffic in your GA reports. Such traffic would appear as direct traffic in your GA reports making it more difficult to detect them. Whenever a referrer is not passed, the traffic is treated as direct traffic by Google Analytics.

Spambot can also create dozens of fake referrer headers. So if you block one referrer, they may send your website another fake referrer. So whether your block the spammy referrer by GA view filter or by using .htaccess, there is no guarantee that your website has completely blocked spambot.

Method #2 Block the IP address used by the spam bot

Access your .htaccess file and add a code like the one below:

RewriteEngine On

Options +FollowSymlinks

Order Deny,Allow

Deny from 234.45.12.33

Note: Do not copy paste this code into your .htaccess, it won’t work. This is just an example to show you how to block an IP address in .htaccess file. Spambots can come from many different IP addresses. So you need to keep adding IP addresses used by the spambots effecting your website.

Tip: Block only those rogue IP addresses which are effecting your website. Do not try to block all known rouge IP addresses as this will make your htaccess file very large and hard to manage and will impact your web server performance. If your blacklisted lP address list keep getting bigger and bigger than you have got serious website/network security issues. Contact your web host or system administrator. Search google to find list of blacklisted IP addresses. You should automate this process by writing a script which can automatically find and ban known rogue IPs.

Method #3: Block the IP address range used by spam bot

If you are sure that a particular range of IP addresses is being used by spam bots then you can block the whole IP address range like the one below:

RewriteEngine On

Options +FollowSymlinks

Deny from 76.149.24.0/24

Allow from all

Here 76.149.24.0/24 is a CIDR range. CIDR is a method used for representing range of IP addresses. Blocking by CIDR is more effective than blocking by individual IP addresses as it takes less space on your server.

Tip: You can covert a CIDR to a IP range and vice versa via this tool: http://www.ipaddressguide.com/cidr

If the spam bot is using a botnet (network of infected computers spread in a particular geo location or around the world), then it can access your website via hundreds of different IP addresses thus making IP blacklisting or rate limiting (rate of traffic sent or received) pretty much useless.

The ability of a spam bot to skew your website traffic is directly proportional to the size of the botnet, the spam bot is using. Bigger the size of the botnet, more different IP addresses a spam bot can use to access your website without being blocked out by firewall and other traditional safety mechanism. Many spam bots are designed to infect your computer with a malware, to make your machine a part of their botnet. Once your computer becomes a part of botnet it is then used to forward spam, viruses and other malicious programs to other computers on the internet.

There are hundreds and thousands of computers all over the world which are used by real people and which are part of a botnet. There is a good chance that your computer is a part of a botnet and you don’t know about it. So if you decide to block a botnet, you will most likely block the traffic coming from real people.

If spam bots’ traffic is considerable skewing your website traffic inspite of regular IP address blocking then consider investing in ‘penetration testing’ or ‘bot protection service’.

Method #4: Block the user agents used by spam bot

Go through your server log files once in a week and find and ban malicious user agents (user agents used by spambots). Blocked user agents can not access your website. You can block rogue user agents like the one below:

RewriteEngine On

Options +FollowSymlinks

RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]

RewriteRule .* – [F,L]

A simple search on google can give you a big list of several websites which maintain records of known rogue user agents. Use these records to identify rogue user agents on your website. You should write a script to automate this process. Maintain a database of all known rogue user agents and then use your script to automatically identify and block user agents.

Keep your database up to date as new rogue user agents keep popping up and old one keep disappearing. Block only those rogue user agents which are effecting your website. Do not try to block all known rouge user agents. Otherwise this will make your .htaccess file very large and hard to manage and will impact your web server performance.

Tip: Take help of system administrator. Protecting your client/company’s website from malicious mischief is 24/7 activity and is not really your job. Your system administrator or whoever is in charge of network security is the best person to deal with spam bots attacks. So whenever you discover a new spam bot, inform him/her.

Method #5: Block spam referrers through custom advanced filter in Google Analytics

If for some reason you are not allowed to edit the .htaccess file, then you can block the spam referrers through custom advanced filter in GA. However do not use this method if you can edit the .htaccess file. Monitor your server logs at least once a week. Fighting with spam bots that crawl your website, start at the server level.

If you can stop them from visiting your website in the first place, you don’t need to exclude them later from your GA reports. Blocking spam bots at the server level is always more effective, as you are actually blocking them from visiting the website and not just excluding their traffic from GA. You should minimise the use of view filters as much as possible, as it can create data sampling issues in GA.

Follow the steps below:

Step-1: Create a copy of your main Google Analytics (GA) view. We will test our filter on that view. If the filters seem to be working correctly in that view then apply the same filter on your main view. You need to take this precaution because if you accidentally applied incorrect filters on your main view, your web analytics data may get corrupt for good.

Step-2Navigate to the ‘Admin’ section of your main GA view and then click on the ‘View Settings’ link:

view-settingsStep-3Click on the ‘Copy View’ button:

copy-view

Step-4: Name your new view as ‘your brand name’ + Test View. For example: ‘OptimizeSmart Test View’ and then click on the ‘copy view’ button:copy-view-button

Step-5: Navigate to the ‘Admin’ section of your test view and then click on the ‘Filters’ link:

filters-link

Step-6: Click on the ‘Add filter’ button:add-filter

Step-7: Create a new custom exclude filter and copy-paste the regex you created earlier in the ‘Filter pattern’ text box:

block-spam-bots2

This filter should block all of the traffic from spam referrers you identified.

Step-8: Click on the ‘Verify this filter’ link and then click on the ‘Save’ button:

verify-this-filter

You may then see the filter test results. 

Note: If your test view has little to no data then the ‘verify this filter’ setting may not work and you may see following notification:

This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.

If you get such notification then just ignore it and click on the ‘Save’ button to complete the filter set up. Once you have created this filter, it will appear in the filters list:filter-list

Step-9: If you filter is working as expected then apply it to your main view.

Note: Do not exclude the referrer spam websites from the referral traffic via ‘Referral exclusion list’. This will not solve your problem. It will just hide the problem, as then the traffic from spambot will appear as direct traffic in your GA reports and you will no longer be able to measure the impact of spambots on your website traffic.

Method #6: Use the Google Analytics ‘Bot filtering’ feature 

Follow the steps below:

Step-1: Navigate to ‘Admin’ section of your main GA view and then click on the ‘view settings’ link:

view-settings

Step-2: Scroll down the page and the select the checkbox ‘Exclude all hits from known bots and spiders‘:

bot filtering google analytics

Spam bots which do not visit websites (second generation bots)

These spam bots (like darodar.com) can send fake traffic even without visiting your website. They do that by sending raw fake hit data (commonly known as Ghost traffic) directly to your Google Analytics server via measurement protocol. All they need is your GA property ID.  They can procure property IDs in two ways, that i know:

  1. Through spam bots which crawl websites and scrape GA property IDs.
  2. By randomly generating property IDs

People who don not use Google Tag Manager, leave their Google Analytics tracking code hard coded on their web pages. The hard coded Google Analytics tracking code contains your web property ID. This ID can be scraped by spam bots and could be shared with other spam bots. There is no guarantee that the bot which scraped your web property ID and the bot which sent you fake traffic are the same bot.

Because of this reason there is no guarantee that your GA property won’t get any fake traffic just because your property ID does not contain ‘1’ at the end (common misconception). I have seen many GA properties receiving fake traffic even when their property IDs do not contain the number ‘1’ at the end. You can fix this issue to an extent by using Google Tag Manager (GTM) which hide the property ID at least from the source code.

Since these spam bots do not visit your website, their visit is not recorded in your server log. Since their visit is not recorded in your server log, you can not block them by any traditional methods: IP blocking, user agent blocking, referrer blocking etc

Related Article: What I learned from trying to fix the Ghost Referrer Spam in Google Analytics

 

Hostnames and their role in blocking ghost traffic

In the URL:

https://www.optimizesmart.com/

The part of the URL: ‘‘www.optimizesmart.com’ is called the ‘hostname’.

When a user (including spam bots which crawl websites) really visit your website from another website, the hostname (in most cases) point to your domain name. But when a fake visit is recorded for your website by Google Analytics then the hostname is usually either blank (i.e. not set) or it point to any domain name other than your domain name.

For example, if a user click a link on a page hosted on ‘bbc.co.uk’ website and then visit your website say ‘www.abc.com’ then Google Analytics will record and report the hostname as: ‘www.abc.com’. But in case of a fake visit that seems to be coming from ‘bbc.co.uk’ website, Google Analytics either do not report the hostname or report a hostname other than your website name. This happens because spammers generally, randomly target Google Analytics properties. They generally do not know your website name (hostname) unless they are specifically targeting your website or using spam bots which crawl websites. So they either fake the hostname or leave the hostname value field blank. Whenever Google Analytics is not able to track the hostname, it report it as ‘not set’.

What that means, if you include traffic from only those hostnames in your GA view which you recognise, you can greatly minimise the impact of ghost traffic on your website. Any website where you are using your GA property ID (example: ‘UA-12345-1’) is a valid hostname. This can also include the domain name where you may have hosted your shopping cart. You need to identify all such valid hostnames.  

One valid hostname which we always use is the one, pointing to our own website. You definitely want to keep all the traffic coming from the hostname that point to your website in your GA view.

 

How to find ghost traffic in Google Analytics

Ghost hits can appear in any Google Analytics report. Following are the name of the reports where they are most likely to be found:

  • ‘Referral Traffic’ Report (under ‘Acquisition’ > ‘All Traffic’)
  • ‘Top Events’ Report (under ‘Behavior’ > ‘Events’)
  • ‘Keyword’ Report (under ‘Acquisition’ > ‘All Traffic’ > Source / Medium)
  • ‘Landing Pages’ Report (under (under ‘Behavior’ >’Site Contents’)

 

How to block ghost traffic in Google Analytics

Follow the steps below:

Step-1: Navigate to your main GA view (the view that you regularly use for analysing your website traffic).

Step-2: Navigate to the ‘Network’ report (under ‘Audience’ > ‘Technology’):

network-report

Step-3: Click on the ‘Hostname’ primary dimension:

hostname

Step-4: Set the date range of your report to the last 3 months:

data-range

Step-5: Make a note of all of the ‘hostnames’ whose traffic you want to include in your GA view:

hostname-list

Step-6: Convert the list of your hostnames into regular expressions. For example if following is the list of your hostnames:

  • www.optimizesmart.com
  • www.optimizesmart.com.googleweblight.com
  • translate.googleusercontent.com
  • webcache.googleusercontent.com

then the corresponding regex could be:

www\.optimizesmart\.com(\.)?([a-z]+)?(\.)?(com)?|\.googleusercontent\.com

We will later use this regex while setting up GA view filter. Create a regex which can correctly identify all of the hostnames whose traffic you want to capture in your GA view.

Step-7: Navigate to the ‘Admin’ section of your test view and then click on the ‘Filters’ link.

Step-8: Click on the ‘Add filter’ button.

Step-9: Create a new custom include filter and copy-paste the regex you created earlier in the ‘Filter pattern’ text box:

hostname-filter

Step-10: Click on the ‘Verify this filter’ link. You may then see the filter test results:

filter-results

Note: If your test view has little to no data then the ‘verify this filter’ setting may not work and you may see following notification:

“This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.”

If you get such notification then just ignore it and click on the ‘save’ button to complete the filter set up.

Traffic from following hostnames is fake: 

dev3|localhost|atpnet.local|ingressosrca.com.br|justcrunchit.com|basnodes.com|amazonaws.com|stfi.re

Step-11: If you filter is working as expected then apply it to your main view.

 

Not every website is equally affected by spam bots

This is because spam bots are designed to detect and exploit website’s vulnerabilities. They attack the weak and they attack often. So if your website is hosted on some cheap shared hosting platform or is using a custom CMS/shopping cart, you are more likely to get attacked. Often custom CMS/Shopping carts are not rigorously tested to find and fix application’s vulnerabilities. So it is wise to use reputed hosting provider, CMS and shopping cart solutions.

If you are running Affilate marketing campaigns on a large scale, your website is more likely to be assaulted by spam bots. So choose your affiliates wisely. I used ‘Godaddy’ for hosting my websites. It is not that Godaddy is cheap or some third class web hosting but as long as I used their service, my website was always under a constant threat from bad bots deploying malware and was compromised often.

I spent months fighting malware on my website when it was hosted on the Godaddy. This prompted me to write this article on finding and fixing malware: Malware Removal Checklist for WordPress – DIY Security Guide. It may help you in avoiding referral spam. Website security is not something which really excites me. But Godaddy made me learn every trick in the book to fight malware. When I changed my hosting provider, all the attacks stopped. Now I am not saying that all websites hosted by Godaddy are vulnerable. But this was certainly the case in my situation. So if your website is often attacked by bad bots, then changing your web host may help you.

Also consider using firewall. It act as a filter between your computer/web server and the internet and can protect your website from spam bots. If you work for a large organization, you are most likely already using a firewall.

 

List of widely known spam referrers

If one of the suspicious looking referrers belong to the list of websites mentioned below then it is a spam referrer and you do not need to visit the website to confirm that:

  1. semalt.com
  2. semalt.semalt.com
  3. buttons-for-website.com
  4. blackhatworth.com
  5. 7makemoneyonline.com
  6. ilovevitaly.com
  7. ilovevitaly.co
  8. ilovevitaly.ru
  9. iloveitaly.ro
  10. priceg.com
  11. prodvigator.ua
  12. resellerclub.com
  13. savetubevideo.com
  14. screentoolkit.com
  15. kambasoft.com
  16. socialseet.ru
  17. superiends.org
  18. vodkoved.ru
  19. o-o-8-o-o.ru
  20. iskalko.ru
  21. luxup.ru
  22. myftpupload.com
  23. websocial.me
  24. ykecwqlixx.ru
  25. slftsdybbg.ru
  26. seoexperimenty.ru
  27. darodar.com
  28. econom.co
  29. edakgfvwql.ru
  30. adcash.com
  31. adviceforum.info
  32. hulfingtonpost.com
  33. europages.com.ru
  34. gobongo.info
  35. cenoval.ru
  36. cityadspix.com
  37. cenokos.ru
  38. ranksonic.info
  39. lomb.co
  40. lumb.co
  41. econom.co
  42. 54.186.60.77
  43. srecorder.com
  44. see-your-website-here.com
  45. 76brighton.co.uk
  46. paparazzistudios.com.au
  47. powitania.pl
  48. sharebutton.net
  49. tasteidea.com
  50. descargar-musica-gratis.net
  51. torontoplumbinggroup.com

Use Google Analytics custom alerts to detect spam bot traffic

Use custom alerts to monitor unusual spikes in daily traffic esp. direct and referral traffic. If you are using custom alert in GA, you can quickly detect and fix bad bots issues and thus minimise their impact.

Use Annotation on your Google Analytics charts

Create an annotation on your chart and write a note explaining what caused the unusual traffic spike. You then need to discount this traffic from your analysis.

bot-traffic

annotations

 

Announcement about my new books

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution Modelling in Google Analytics and Beyond
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Himanshu Sharma

Certified web analyst and founder of OptimizeSmart.com

My name is Himanshu Sharma and I help businesses find and fix their Google Analytics and conversion issues. If you have any questions or comments please contact me.

  • Over eleven years' experience in SEO, PPC and web analytics
  • Google Analytics certified
  • Google AdWords certified
  • Nominated for Digital Analytics Association Award for Excellence
  • Bachelors degree in Internet Science
  • Founder of OptimizeSmart.com and EventEducation.com

I am also the author of three books:

error: Content is protected !!