BigQuery is a data warehouse from Google that is used to provide business intelligence through reports and dashboards.
It is a data storage and management system used to bring data from several data sources (like Google Analytics, Google Ads, Facebook, etc.) for the purpose of reporting and analysis.
Therefore, in order to use BigQuery, you will need a Google Cloud Platform account.
Why should you use GA4 with BigQuery?
To benefit from GA4 the most, learn and master BigQuery.
When you use GA4 with BigQuery, you can:
Save your user-specific data from being deleted.
You can access unsampled raw events and user-level data.
Your data is not subject to cardinality limits.
You actually own your data.
You can do advanced data manipulation.
You can integrate GA4 data with other data sources.
You can work retroactively on GA4 data.
BigQuery lets you easily filter out or modify incorrect GA4 data.
#1 Save your user-specific data from being deleted.
Most people don’t know, but they lose user-specific data for inactive website users in GA4 every 2 or 14 months based on their current data retention settings:
Cardinality refers to the number of unique values in a data set.
A data set is a set of observed values for a particular variable.
For example,
Consider the following data set: {20, 15, 61, 8}
This data set has four unique values. Therefore the cardinality of this data set is four.
The data set with many unique values is called a high cardinality data set.
Google defines high cardinality dimensions as dimensions with more than 500 unique values in one day.
So if GA4 reports more than 500 unique values for the ‘Item name’ dimension in one day, then it would be considered as a high cardinality dimension.
When you use a high cardinality dimension in a GA4 report, it could result in some (or a lot of data) reported under the (other) row:
#4 You actually own your data.
When you use BigQuery to collect and store GA4 data, you get complete control and ownership of your data.
This means you can choose how long to keep your data, who has access to it, how it is used, etc.
When you use GA4 to collect and store your data, your data is stored on Google’s servers and is subject to Google’s data retention policies.
#5 You can do advanced data manipulation.
When you use BigQuery, you can manipulate GA4 data in ways that are many times not possible when using the GA4 user interface or the data API.
One of the biggest advantages of using BigQuery is that it does not have the same limitations as the GA4 user interface and API regarding which dimensions and metrics can be queried together.
That means you can perform more advanced data segmentation and analysis when using GA4 with BigQuery.
When you use GA4 with BigQuery, you get access to a wide range of tools and features that are not available in the GA4 user interface or the data API.
For example,
With BigQuery, you can create custom queries to extract specific data, join data from multiple data sources, and perform complex calculations and aggregations.
BigQuery also provides machine learning capabilities that can help you forecast future trends and gain deeper insights.
As a result of these capabilities, many companies and web analysts use BigQuery as their primary tool for querying analytics data.
They may use the GA4 user interface and API for basic queries and reporting but rely mainly on BigQuery for more advanced analysis and insights.
#6 You can integrate GA4 data with other data sources.
Since BigQuery is a data warehouse, it allows you to import data from multiple data sources and then combine and correlate them with GA4 data.
These data sources could include (but are not limited to) CRMs, shopping carts, marketing automation platforms, etc.
You can easily integrate BigQuery with data visualisation tools like Looker Studio and can thus easily visualise GA4 data.
Integrating GA4 data with other data sources can provide several key advantages, such as a better understanding of the customer’s purchase journey, better decision-making, and improved marketing effectiveness.
#7 You can work retroactively on GA4 data
When you use the GA4 user interface or the data API, the conversions and filters do not work retroactively and are collected and reported only from when you first set up your tracking.
That means that if you want to calculate conversions based on historical data, it is not possible with the GA4 user interface or data API.
However, when you use BigQuery to store your GA4 data, you can work retroactively on your data and perform calculations and analysis based on historical data.
#8 BigQuery lets you easily filter out or modify incorrect GA4 data.
When you use GA4 with BigQuery, you can easily filter out or modify incorrect GA4 data from your analysis and reports.
This can help ensure that your data analysis and reports are based on accurate data and can lead to better decision-making.
In contrast, when you use the GA4 user interface or data API to query data, filtering out or modifying incorrect data can be more challenging.
This is because the GA4 user interface and data API do not provide SQL querying capabilities, which makes it harder to manipulate the data to filter out or modify incorrect data.
Why are people reluctant to use BigQuery?
Despite several advantages of using GA4 with BigQuery, many people are still reluctant to use BigQuery and rely on either the GA4 user interface or the data API.
They are two main reasons why people are reluctant to use BigQuery:
#1. They think it is crazy expensive to use, but it is not (for the majority of companies).
BigQuery is of little use to you if you can’t query the data.
But to query the data in BigQuery, you need to know SQL.
SQL (Structured Query Language) is a programming language which is used to store, access and manipulate data in a database like BigQuery.
Why should you use natural language to generate SQL?
SQL has a complex syntax and structure.
You will need to learn and memorise many keywords, operators, and functions. And understanding how to combine them can be challenging.
SQL queries can become progressively harder to understand and debug as they become more complex.
Even small syntax errors or logical mistakes can result in incorrect results.
Complex queries often involve nested subqueries, multiple joins, and complex logic, which can be difficult to follow.
All of this can make SQL hard to learn and use, especially for absolute beginners.
With the advent of AI-powered tools like GA4 BigQuery Composer, it is now possible to convert natural language into SQL queries.
Natural language refers to the language humans use daily to communicate, such as English, French, Spanish, etc.
We use the words ‘natural language’ mainly in the context of machine languages (also known as artificial languages or programming languages).
To convert natural language into SQL query, you give instructions to ‘GA4 BigQuery Composer’ in plain English, and then the composer converts your text-based instructions into SQL for BigQuery.
For example:
What is GA4 BigQuery Composer?
GA4 BigQuery Composer is a ChatGPT plugin for creating efficient, precise SQL queries for GA4 BigQuery.
What GA4 BigQuery Composer can do that other tools can not do?
#1 Through ‘GA4 BigQuery composer’ tool, you can write SQL queries for Google BigQuery even when you don’t understand a single line of code. The Composer will automatically convert your instructions in plain English into SQL code.
For example:
#2 Unlike the regular chatgpt, the ‘GA4 BigQuery composer’ is specially trained on GA4 BigQuery data sets. So, its results are likely to be much more accurate.
#3 Through ‘GA4 BigQuery composer’, you can generate even complex SQL queries within seconds.
#4 The composer allows you to troubleshoot SQL even when you don’t understand a single line of code. Just copy-paste the error message into the Composer tool, and it will rewrite the SQL for you.
For example:
#5 If you are not satisfied with the SQL generated by ‘Composer’ (maybe you want to rename a particular column, add a new field, or sort data by a particular column), it can re-write the SQL for you within seconds.
#6 When you use ‘GA4 BigQuery composer’, its usage drastically reduces your chances of making errors in query syntax or structure while writing SQL queries.
#7 By automating the query-building process, the Composer saves tons of your time.
#6 You must have access to ChatGPT‘s paid version (‘Plus’ or ‘Team’) as custom GPTs like ‘GA4 BigQuery Composer’ are available only in the paid versions.
Now, let’s move to the best part, i.e. generating SQL via ‘GA4 BigQuery Composer’.
Example-1
Consider the following data table:
This data table is for March 15, 2023 which is denoted by 20230315
The table id for this data table is: dbrt-ga4.analytics_207472454.events_20230315
Let us suppose you want to see only the ‘event_name’ and ‘event_date’ columns in the table.
Let us also suppose you want to see only the first 10 rows.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which shows only the ‘event_name’ and ‘event_date’ columns of the data table and show only the top 10 rows for March 15, 2023.
Provide your table ID when asked.
You should now see the required SQL:
Copy-paste the SQL code into the SQL editor (of your BigQuery project) and then click on the ‘Run’ button:
You should now see the query results in the window below:
From the screenshot above, we can conclude that our data table contains 10 rows and the following two columns: ‘event_name’ and ‘event_date’. This proves that our SQL was correct.
Example-2:
Consider the following data table:
You want to query the data which meets the following requirements:
You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
You want to see all the columns of the data table.
You want to see only those rows where the ‘event_name’ is ‘page_view’
Use the following text prompt in ‘GA4 BigQuery Composer’
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Shows all the columns of the data table.
3) Shows only those rows where the ‘event_name’ is ‘page_view’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
From the screenshot above, we can conclude that our data table contains all columns but only those rows where the ‘event_name’ is ‘page_view.’ This proves that our SQL was correct.
Example-3:
You want to query the data which meets the following requirements:
You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
You want to see all the columns of the data table.
You want to see only those rows where the ‘event_name’ is ‘page_view’
You want to see only the first 10 rows.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Shows all the columns of the data table.
3) Shows only those rows where the ‘event_name’ is ‘page_view’
4) Show only the first 10 rows.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
From the screenshot above, we can conclude that our data table contains all columns, but only those rows where the ‘event_name’ is ‘page_view’, and it contains only 10 rows.
This proves that our SQL was correct.
Example-4:
You want to query the data which meets the following requirements:
You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
You want to see only the ‘event_name’ column
You want to see only those rows where the ‘event_name’ is ‘session_start’.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show only the ‘event_name’ column of the data table.
3) Shows only those rows where the ‘event_name’ is ‘session_start’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-5:
You want to query the data which meets the following requirements:
You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
You want to see only the ‘event_name’ column
You want to see only those rows where the ‘event_name’ is ‘session_start’
You want to see ‘event_name’ column as ‘count of session_start’
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show only the ‘event_name’ column of the data table.
3) Shows only those rows where the ‘event_name’ is ‘session_start’
4) Show the value of ‘event_name’ column as ‘count of session_start’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
The provided SQL query retrieves data specifically for sessions that started on March 15, 2023.
Example-6:
You want to query the data which meets the following requirements:
You want to query data between the date range March 17, 2023 and March 19, 2023.
You want to see only the ‘event_date’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data between the date range March 17, 2023 and March 19, 2023.
2) Show only the ‘event_date’ column of the data table.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-7:
You want to query the data which meets the following requirements:
You want to query data between the date range March 17, 2023 and March 19, 2023
You want to see only the ‘event_date’ column.
You want to sort the ‘event_date’ column in ascending order.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data between the date range March 17, 2023 and March 19, 2023.
2) Show only the ‘event_date’ column of the data table.
3) Sort the ‘event_date’ column in ascending order.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-8:
Consider the following data table:
You want to query the data which meets the following requirements:
You want to query data for March 15, 2023
You want to see the ‘event_name’ column
You want to see the value of ‘page_location’ event parameter.
You want to see the value of ‘page_location’ as ‘page_location’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table.
3) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
Pay attention to the last two line of the text prompt:
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
It is a good practice to let ‘GA4 BigQuery Composer’ know how to retrieve the value of the ‘page_location’ event parameter.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-9:
You want to query the data which meets the following requirements:
You want to query data for March 15, 2023
You want to see the ‘event_name’ column where ‘event_name’ is ‘first_visit’
You want to see the value of ‘page_title’ event parameter.
You want to see the value of ‘page_title’ as ‘page_title’ column.
You want to see the value of ‘page_location’ event parameter.
You want to see the value of ‘page_location’ as ‘page_location’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table where the value of ‘event_name’ is ‘first_visit’
3) Show the value of ‘page_title’ event parameter as ‘page_title’ column.
The ‘page_title’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_title’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
4) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-10:
You want to query the data which meets the following requirements:
You want to query data for March 15, 2023
You want to see the ‘event_name’ column where ‘event_name’ is ‘first_visit’
You want to see the value of ‘page_title’ event parameter.
You want to see the value of ‘page_title’ as ‘page_title’ column.
You want to see the value of ‘page_location’ event parameter.
You want to see the value of ‘page_location’ as ‘page_location’ column.
You want to see the value of ‘engaged_session_event’ event parameter.
You want to see the value of ‘engaged_session_event’ as ‘engaged_session_event’ column.
You want to see the value of ‘engaged_session_event’ column where ‘engaged_session_event’ is 1
Use the following text prompt in ChatGPT:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table where the value of ‘event_name’ is ‘first_visit’
3) Show the value of ‘page_title’ event parameter as ‘page_title’ column.
The ‘page_title’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_title’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
4) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
5) Show the value of ‘engaged_session_event’ event parameter as ‘engaged_session_event’ column.
The ‘engaged_session_event’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘engaged_session_event’ event parameter is obtained by selecting ‘event_params.value.int_value’ field.
6) Show only those values of ‘engaged_session_event’ event parameter where ‘engaged_session_event’ is equal to 1.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
My best selling books on Digital Analytics and Conversion Optimization
Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.
Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.
Attribution Modelling in Google Analytics and BeyondSECOND EDITION OUT NOW!
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.
Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.
About the Author
Himanshu Sharma
Founder, OptimizeSmart.com
Over 15 years of experience in digital analytics and marketing
Author of four best-selling books on digital analytics and conversion optimization
Nominated for Digital Analytics Association Awards for Excellence
Runs one of the most popular blogs in the world on digital analytics
Consultant to countless small and big businesses over the decade
Automate GA4 BigQuery SQL With ChatGPT - [No Prior Knowledge of BigQuery or SQL Required]
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.