Google Analytics Data Sampling – Complete Guide

 

Did you know, your Google Analytics metrics from visits, revenue, transactions to e-commerce conversion rate depends upon the size of the traffic data sample being analyzed by Google?

This means if you change the size of the sample being analyzed, your e-commerce conversion rate could change, the revenue reported by Google analytics report could change esp. if your website gets lot of traffic. By lot of traffic I mean millions of pageviews a month.

For example, let us suppose you want to calculate the ‘e-commerce conversion rate’ of your organic search. One way is that you apply the ‘non-paid search traffic’ advanced segment to the ‘All Traffic’ report:

The e-commerce conversion of the ‘non-paid search traffic’ here is 1.39%. Now let us change the sample size by clicking on the ‘checkboard’ button and then dragging the slider to the extreme left:

Now wait for your report to load back up.

You can now see that the e-commerce conversion rate of the ‘non-paid search traffic’ is 2.08%.

Note, not only e-commerce conversion rate has changed but all other metrics from visits, revenue, transactions to per visit value have changed too. Some metrics like ‘visits’ have increased while other metrics like ‘average order value has gone down.

Similarly if you again click on the ‘checkboard’ button and then drag the slider to the extreme right then there is a possibility that some or all of your metrics would change once again. Now the question is ‘why this is happening’.

The answer is ‘sample size’. We have changed the sample size and sample size impacts the Google Analytics calculations of various metrics.

Google Analytics selects only a subset of data (called sample) from your website traffic to produce reports. This process is known as data sampling.

Sampling is widely used in statistical analysis to analyze large data sets in a cost efficient manner and in a reasonable amount of time. This is why Google Analytics does data sampling.

Google Analytics has an upward limit on the amount of traffic data it will sample to produce reports.

This limit has been set to save resources (computation power and cost).

 

“As long as the sample is a good representative of all of the data, analyzing a subset of data (or sample) gives similar results to analyzing all of the data. “

But

what if the selected sample is not a good representative of all of the data or what if selected sample is too small to make accurate estimates?

In such case you get inaccurate traffic estimates from Google Analytics, as I have just proved it to you above. I minimized the size of the sample through checkboard button and Google reported a different set of metrics.

We call such issue as data sampling issue.  When Google Analytics is sampling your data badly, you can’t rely on the metrics reported by it. Any marketing decisions based on such reports could result in huge monetary loss.

 

When Google Analytics starts sampling the data?

In following cases Google Analytics starts sampling the data when calculating the result for a report:

1. Whenever you apply advanced segment or secondary dimension to a Google Analytics report (regardless of the size of the website being analyzed)

2.Whenever you view a report that is based on more than 500k visits

3.Whenever you view a multi channel funnel report which has got more than 1 million conversions.

4.Whenever you view a flow visualization report that is based on more than 100k visits

5.Whenever your query for data that is not available in aggregate. This is quite common in case of custom reports. So when you are using custom reports in Google

6.Analytics, the data is sampled.

Sometimes when you view a sampled report, Google analytics give you the option to adjust (increase or decrease) the sample size by displaying the ‘checkboard’ button at the top right hand side of a report.

Sometimes you see following notification in yellow color at the top right of the report:

If you often see such yellow notifications (the report is based on N visits) every time you see a standard report or apply an advanced segment, you may have got ‘data sampling’ issues.

Note: you can receive sampled data even when are using Google Analytics API.

 

What Causes Data Sampling Issues?

If you have a low traffic website, you get unsampled data in your standard reports.  If you have a high traffic website then Google may start collecting sampled data even for your standard reports.

Sampled data for high traffic websites often result in poor traffic estimates as the data sample is not a good representative of all of the website traffic. In such case you have to face data sampling issues.

There are two main reasons which could cause data sampling issues:

  1. Data sample selected by Google Analytics is too small to make any accurate traffic estimates.
  2. Selected sample is not a good representative of all of the data.

Low traffic websites generally don’t face data sampling issues. Google analytics handles ‘data sampling’ for such websites really very. It is only for the big websites which get millions of pageviews each month that GA may end up doing bad data sampling.

And the worst part is that you will never know unless you see that yellow notification or notice huge discrepancies between sampled and unsampled data reports.

 

Why data sampling issue is very damaging for your business?

If you have got data sampling issues you are probably a big business and if you are a big business then majority of your online marketing decisions are data driven. Consequently you need very high accuracy in traffic data. Otherwise your metrics from ‘conversion rate’, ‘revenue’ to ‘visits’ could be anywhere from 10% to 80% off the mark.

For example Google Analytics may report your last month revenue to be say $1.2 million when in fact it is only $650k.  You can determine such discrepancies by comparing a sampled report with its unsampled version and then calculating the percentage of difference between various metrics.

Make sure that the difference is statistically significant before you draw any conclusion.

GA premium let you download unsampled reports. Sampling can cause a huge difference to your report.  So you need to be very careful. Any marketing decision based on inaccurate date could be fatal for your enterprise business health.

You can rely on metrics like ‘visits’ and ‘pageviews’ even with significant data sampling issues.

However you can’t rely on e-commerce metrics (like revenue, transactions, e-commerce conversion rate etc) when you have got data sampling issues.

 

How you can fix the data sampling issues?

You can never truly eliminate the sampling of data.  But here is what you can do to minimize the impact of ‘data sampling’ issues:

1. Use the largest possible sample size by clicking on the ‘checkboard’ button and then dragging the slider to the extreme right:

The larger the data size being sampled, the more accurate the traffic estimates are.

On the other hand, smaller the data size being sampled, the less accurate the traffic estimates are as I have proved it above. But here is one caveat. There is still an upward limit on how big you can make the sample.

Your website can still have data sampling issues even with the biggest possible sample size you selected via the ‘checkboard’ button.

Note: When you use largest possible sample size you may have to wait a little longer for your report to load but this wait is generally not more than few seconds. I prefer waiting a little longer to get accurate data.

2. Avoid applying advanced segments or secondary dimensions to your standard reports when you are going to use metrics like ‘e-commerce conversion rate’ for reporting purpose and your analytics account has got data sampling issues.

3. Run reports for shorter time frame which would include less than 500k visits. Remember:

whenever you view a report that is based on more than 500k visits, Google Analytics automatically start sampling the data.  

Download the data and then aggregate it manually.

Note: You can’t aggregate ratio metrics like bounce rate or conversion rate. However you can aggregate number metrics like visits, page views etc. 

4. Plan out in advance how you want to segment the data and view the reports. Then instead of applying advanced segments create filtered profiles.

The data that is filtered at the profile level is unsampled.

For example if you apply the advanced segment ‘non-paid search traffic’ to the ‘All Traffic’ report so that you can determine the ‘e-commerce conversion rate’ of your organic search then your report data will be sampled. But if you create a filtered profile which shows only ‘organic search data’ then your report data will be unsampled.

 

Ultimate Solution to Data Sampling issues

All of the solutions I have mentioned so far are quick fixes but not permanent solutions for bad data sampling. At some point you have to switch to Google Analytics Premium or other enterprise level analytics software if the tips above are not helping you much in minimizing data sampling issues.

You can’t rely on free versions of the analytics tools for large amount of data processing and high accuracy.

I have been using GA premium for quite a while now and I get lot of emails from people asking about its capabilities and whether $150k /year spend is really worth it.  I say,

if your annual online revenue is at least $1 million and your website gets more than 10 million pageviews/month then you should definitely invest in enterprise level analytics software like GA premium.

From my experience it is hard to justify $150k/year spend on an analytics tool if the online revenue is less than $1 million per year.

 

Google Analytics Premium and Data Sampling

The data sampling limit of GA premium is approx. 200 times than that of standard GA which means you get more unsampled data in GA premium than in GA standard.

GA premium can handle websites which get up to 1 billion pageviews /month.

So unless you run Google or Yahoo website these data limits should be sufficient for you. The interface of GA premium is just like that of standard GA. So for an untrained eye, there is visually no difference between GA Premium and GA standard.

But if you consider GA standard as 250 cc bike than GA premium is 3500 cc bike. The real difference is in the processing power (besides unsampled reports). Large data processing and producing unsampled reports require huge processing load on the servers and are very costly. I think this makes GA premium so expensive.

If you have access to GA premium, you can follow the steps below to get unsampled reports:

Step-1: Click on the ‘standard reporting’ tab.

Step-2: Select the report whose unsampled data you want to download.

Step-3: Select ‘unsampled report’ from the ‘Export’ tab:

Note: Unsampled Reports are available only in GA premium.

Step-4: Name the report, select the frequency and click on the ‘request unsampled’ button.The frequency is how often you want the unsampled report: once, daily, weekly, monthly or Quarterly:

Step-5: Click on the ‘custom reporting’ tab > Unsampled Downloads > Overview to see all the unsampled reports you have requested and the availability status of each report (pending, completed):

Once the report is available for download, click ‘csv’ to download the report.

Note: The unsampled data is not available in all reports. It is not available for reports which contain non-standard tables and data views like overview reports.  If you can customize such reports than the unsampled data may be available via ‘custom reporting’ tab.

If you try to download unsampled data for non-standard table reports (like overview reports), you will get the following message from Google:

‘Unsampling is not available because this report is not a standard table report’

Other Posts you may find usefulHow to do ROI calculations for SEO

 

Join over 4000 subscribers!
Receive an update straight to your inbox every time I publish a new article.

 

About the Author:



My business thrives on referrals, so I really appreciate recommendations to people who would benefit from my help. Please feel free to endorse/forward my LinkedIn Profile to your clients, colleagues, friends and others you feel would benefit from SEO, PPC or Web Analytics.