In this article, I will talk about understanding data sampling in Google Analytics 4 (GA4). I will also cover hit limits, thresholding and cardinality in Google Analytics 4.
In data analysis, sampling is the process of analyzing a subset of data for analysis and reporting based on the similarity detected in the subset and the larger data set.
For example, if you want to estimate the number of cars parking in a 1000 square meter area where the distribution of car parking was fairly uniform, you could count the number of cars parking in 10 square meters and multiply by 100, or count the cars parking in a 5 square meter and multiply by 200 to get an accurate representation of the entire 1000 square meters.
In Google Analytics 4, a few reports are always unsampled, and a few are sampled based on the conditions. Let’s understand how sampling happens in GA4 in more detail.
In Google Analytics 4, reporting is divided into two categories in the ‘Analysis’ tab; standard reports and advanced reports.
Standard reports are always unsampled in GA4 (based on 100% of data for the selected date range), and advanced reports are sometimes sampled based on the conditions of what data you choose to see.
The below image shows the standard reporting options in GA4, which are unsampled.
The next image shows the advanced reporting options in GA4, which are sometimes sampled.
These advanced reports include the following techniques:
Unlike in Universal Analytics, the data may be sampled if you apply a secondary dimension or segment to the standard reports. But in the case of GA4, you can apply comparisons, and secondary dimensions, filter your reports, and everything will continue to be unsampled.
If you are viewing an unsampled GA4 report, then you will see a green reporting icon with a checkmark at the top of the report:
If you hover your mouse over the green reporting icon, you will see the following message “This report is based on 100.0% of available data.”
If you are viewing a sampled GA4 report, then you will see a yellow reporting icon with a % symbol at the top of the report:
If you hover your mouse over the yellow reporting icon, you will see the following message “This report is based on XX% of available data.” (In our case, XX represents 95.28%).
Sampling differences in Google Analytics 4 Vs Universal Analytics
In Universal Analytics, default reports (standard reports) are not subject to sampling. But if you apply ad-hoc queries to your data (like secondary dimensions or segments), they are subject to the below general sampling thresholds.
Standard Analytics: 500k sessions at the property level for the date range you are using
Analytics 360: 100M sessions at the view level for the date range you are using
In the case of Google Analytics 4, the default reports (standard reports) are always unsampled. You can apply comparisons and custom parameters to your report, and all the reports will continue to be unsampled.
The advanced report in the ‘Analysis’ tab may sometimes be sampled. In general, sampling occurs in advanced reporting when the data exceeds 10 million in counts, and the report you are creating is not a replica (similar) to the standard report.
Hit limits in Google Analytics 4
In the case of Universal Analytics (standard), there is a hit limit of 10 million per month per account. However, Google Analytics 4 is a free tool and has no hit limits. I have searched a lot about this, but it is not mentioned anywhere in the documentation. This makes it a more premium analytics tool at no cost.
Thresholding in Google Analytics 4
In Google Analytics 4, thresholds are applied to prevent anyone viewing a report from inferring the demographics or interests of individual users to the website.
When a report contains age, gender, or interest categories (e.g. as a primary or secondary dimension, a data comparison, or a segment), a threshold may be applied, and some data may be kept hidden (unknown) from the report.
Google defines these thresholds, and you cannot adjust them. However, if a threshold has been applied to a report, you will see unknown values in the report. These values are replaced by “unknown” to keep user identity and basic information hidden.
Each report in Google Analytics 4 has dimensions assigned to it, and each dimension has several values that can be assigned to it. For example, the gender dimension has three potential values (male, female or other), so that dimension’s cardinality is three.
The total number of unique values for a dimension is known as its cardinality.
Dimensions with a large number of possible values are known as high-cardinality dimensions. For example, the page dimension has different values for every URL on your website.
If a report contains high-cardinality dimensions, it may get affected by Google Analytics system limits (Google-defined), resulting in the creation of rolled-up (other) entries in the report.
Cardinality may occur in standard reports as well as advanced reports in the ‘Analysis’ tab.
There is no such definition available from Google on when cardinality appears (limit), but in general, this may occur if you have more than 25,000 to 30,000 unique values for a dimension in the selected date range.
Summary
GA4 will always show you unsampled reports for standard reports, and only in the case of advanced reporting options in the ‘Analysis’ tab (cohort analysis, exploration, segment overlap, funnel analysis, path analysis, and user explorer) might they be sampled.
Frequently Asked Questions about Understanding Data Sampling in Google Analytics 4 (GA4)
What is data sampling?
In data analysis, sampling is the process of analyzing a subset of data for analysis and reporting based on the similarity detected in the subset and the larger data set.
Are all reports in GA4 sampled?
Standard reports are always sampled in GA4 (based on 100% of data for the selected date range), and advanced reports are sometimes sampled based on the conditions of what data you choose to see.
What is thresholding in GA4?
When a report contains age, gender, or interest categories (e.g. as a primary or secondary dimension, a data comparison, or a segment), a threshold may be applied, and some data may be kept hidden (unknown) from the report.
Google defines these thresholds, and you cannot adjust them. However, if a threshold has been applied to a report, you will see unknown values in the report. These values are replaced by “unknown” to keep user identity and basic information hidden.
What is cardinality in GA4?
The total number of unique values for a dimension is known as its cardinality. For example, the gender dimension has three potential values (male, female or other), so that dimension’s cardinality is three.
Register for the FREE TRAINING...
"How to use Digital Analytics to generate floods of new Sales and Customers without spending years figuring everything out on your own."
Here’s what we’re going to cover in this training…
#1 Why digital analytics is the key to online business success.
#2 The number 1 reason why most marketers are not able to scale their advertising and maximize sales.
#3 Why Google and Facebook ads don’t work for most businesses & how to make them work.
#4 Why you won’t get any competitive advantage in the marketplace just by knowing Google Analytics.
#5 The number 1 reason why conversion optimization is not working for your business.
#6 How to advertise on any marketing platform for FREE with an unlimited budget.
#7 How to learn and master digital analytics and conversion optimization in record time.
My best selling books on Digital Analytics and Conversion Optimization
Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.
Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.
Attribution Modelling in Google Analytics and BeyondSECOND EDITION OUT NOW!
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.
Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.
About the Author
Himanshu Sharma
Founder, OptimizeSmart.com
Over 15 years of experience in digital analytics and marketing
Author of four best-selling books on digital analytics and conversion optimization
Nominated for Digital Analytics Association Awards for Excellence
Runs one of the most popular blogs in the world on digital analytics
Consultant to countless small and big businesses over the decade
Learn and Master Google Analytics 4 (GA4) - 126 pages ebook
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.