Query GA4 data in BigQuery without understanding SQL
In this article, I will show you how to query GA4 data in BigQuery without understanding a single line of SQL code.
I have been using GA4 with BigQuery for over 1100 days (i.e. over 3 years) now.
I also teach BigQuery.
A lot of people don’t use BigQuery because they need to learn SQL first, and SQL is hard.
My goal is to make BigQuery accessible to all and make the knowledge of SQL irrelevant.
What is BigQuery?
BigQuery is a data warehouse from Google that is used to provide business intelligence through reports and dashboards.
It is a data storage and management system used to bring data from several data sources (like Google Analytics, Google Ads, Facebook, etc.) for the purpose of reporting and analysis.
BigQuery is one of the products of the Google Cloud platform.
Therefore, in order to use BigQuery, you will need a Google Cloud Platform account.
Why should you use GA4 with BigQuery?
To benefit from GA4 the most, learn and master BigQuery.
When you use GA4 with BigQuery, you can:
- Save your user-specific data from being deleted.
- You can access unsampled raw events and user-level data.
- Your data is not subject to cardinality limits.
- You actually own your data.
- You can do advanced data manipulation.
- You can integrate GA4 data with other data sources.
- You can work retroactively on GA4 data.
- BigQuery lets you easily filter out or modify incorrect GA4 data.
#1 Save your user-specific data from being deleted.
Most people don’t know, but they lose user-specific data for inactive website users in GA4 every 2 or 14 months based on their current data retention settings:
To avoid this data loss, connect your GA4 property to BigQuery and start collecting GA4 data there.
#2 You can access unsampled raw events and user-level data.
In GA4, the raw events and user-level data are only available via the explorations reports, but they are subject to data sampling.
On the other hand, BigQuery data tables do not suffer from data sampling issues.
You get access to unsampled raw events and user-level data when you use BigQuery with GA4.
#3 Your data is not subject to cardinality limits.
When you access GA4 data via the reporting interface or the data API, it can suffer from cardinality issues.
When using GA4 reports, you should avoid cardinality wherever possible. That’s how you can remove the (other) row from appearing in your GA4 reports.
Cardinality refers to the number of unique values in a data set.
A data set is a set of observed values for a particular variable.
For example,
Consider the following data set: {20, 15, 61, 8}
This data set has four unique values. Therefore the cardinality of this data set is four.
The data set with many unique values is called a high cardinality data set.
Google defines high cardinality dimensions as dimensions with more than 500 unique values in one day.
So if GA4 reports more than 500 unique values for the ‘Item name’ dimension in one day, then it would be considered as a high cardinality dimension.
When you use a high cardinality dimension in a GA4 report, it could result in some (or a lot of data) reported under the (other) row:
#4 You actually own your data.
When you use BigQuery to collect and store GA4 data, you get complete control and ownership of your data.
This means you can choose how long to keep your data, who has access to it, how it is used, etc.
When you use GA4 to collect and store your data, your data is stored on Google’s servers and is subject to Google’s data retention policies.
#5 You can do advanced data manipulation.
When you use BigQuery, you can manipulate GA4 data in ways that are many times not possible when using the GA4 user interface or the data API.
One of the biggest advantages of using BigQuery is that it does not have the same limitations as the GA4 user interface and API regarding which dimensions and metrics can be queried together.
That means you can perform more advanced data segmentation and analysis when using GA4 with BigQuery.
When you use GA4 with BigQuery, you get access to a wide range of tools and features that are not available in the GA4 user interface or the data API.
For example,
With BigQuery, you can create custom queries to extract specific data, join data from multiple data sources, and perform complex calculations and aggregations.
BigQuery also provides machine learning capabilities that can help you forecast future trends and gain deeper insights.
As a result of these capabilities, many companies and web analysts use BigQuery as their primary tool for querying analytics data.
They may use the GA4 user interface and API for basic queries and reporting but rely mainly on BigQuery for more advanced analysis and insights.
#6 You can integrate GA4 data with other data sources.
Since BigQuery is a data warehouse, it allows you to import data from multiple data sources and then combine and correlate them with GA4 data.
These data sources could include (but are not limited to) CRMs, shopping carts, marketing automation platforms, etc.
You can easily integrate BigQuery with data visualisation tools like Looker Studio and can thus easily visualise GA4 data.
Integrating GA4 data with other data sources can provide several key advantages, such as a better understanding of the customer’s purchase journey, better decision-making, and improved marketing effectiveness.
#7 You can work retroactively on GA4 data
When you use the GA4 user interface or the data API, the conversions and filters do not work retroactively and are collected and reported only from when you first set up your tracking.
That means that if you want to calculate conversions based on historical data, it is not possible with the GA4 user interface or data API.
However, when you use BigQuery to store your GA4 data, you can work retroactively on your data and perform calculations and analysis based on historical data.
#8 BigQuery lets you easily filter out or modify incorrect GA4 data.
When you use GA4 with BigQuery, you can easily filter out or modify incorrect GA4 data from your analysis and reports.
This can help ensure that your data analysis and reports are based on accurate data and can lead to better decision-making.
In contrast, when you use the GA4 user interface or data API to query data, filtering out or modifying incorrect data can be more challenging.
This is because the GA4 user interface and data API do not provide SQL querying capabilities, which makes it harder to manipulate the data to filter out or modify incorrect data.
Why are people reluctant to use BigQuery?
Despite several advantages of using GA4 with BigQuery, many people are still reluctant to use BigQuery and rely on either the GA4 user interface or the data API.
They are two main reasons why people are reluctant to use BigQuery:
#1. They think it is crazy expensive to use, but it is not (for the majority of companies).
#2 They need to know SQL really well.
Related Articles:
What is SQL?
BigQuery is of little use to you if you can’t query the data.
But to query the data in BigQuery, you need to know SQL.
SQL (Structured Query Language) is a programming language which is used to store, access and manipulate data in a database like BigQuery.
Why should you use natural language to generate SQL?
SQL has a complex syntax and structure.
You will need to learn and memorise many keywords, operators, and functions. And understanding how to combine them can be challenging.
SQL queries can become progressively harder to understand and debug as they become more complex.
Even small syntax errors or logical mistakes can result in incorrect results.
Complex queries often involve nested subqueries, multiple joins, and complex logic, which can be difficult to follow.
All of this can make SQL hard to learn and use, especially for absolute beginners.
With the advent of AI-powered tools like GA4 BigQuery Composer, it is now possible to convert natural language into SQL queries.
Natural language refers to the language humans use daily to communicate, such as English, French, Spanish, etc.
We use the words ‘natural language’ mainly in the context of machine languages (also known as artificial languages or programming languages).
To convert natural language into SQL query, you give instructions to ‘GA4 BigQuery Composer’ in plain English, and then the composer converts your text-based instructions into SQL for BigQuery.
For example:
What is GA4 BigQuery Composer?
GA4 BigQuery Composer is a ChatGPT plugin for creating efficient, precise SQL queries for GA4 BigQuery.
What GA4 BigQuery Composer can do that other tools can not do?
#1 Through ‘GA4 BigQuery composer’ tool, you can write SQL queries for Google BigQuery even when you don’t understand a single line of code. The Composer will automatically convert your instructions in plain English into SQL code.
For example:
#2 Unlike the regular chatgpt, the ‘GA4 BigQuery composer’ is specially trained on GA4 BigQuery data sets. So, its results are likely to be much more accurate.
#3 Through ‘GA4 BigQuery composer’, you can generate even complex SQL queries within seconds.
#4 The composer allows you to troubleshoot SQL even when you don’t understand a single line of code. Just copy-paste the error message into the Composer tool, and it will rewrite the SQL for you.
For example:
#5 If you are not satisfied with the SQL generated by ‘Composer’ (maybe you want to rename a particular column, add a new field, or sort data by a particular column), it can re-write the SQL for you within seconds.
#6 When you use ‘GA4 BigQuery composer’, its usage drastically reduces your chances of making errors in query syntax or structure while writing SQL queries.
#7 By automating the query-building process, the Composer saves tons of your time.
To learn more about the ‘GA4 BigQuery composer’ and how it works, check out this article: GA4 BigQuery Composer Tutorial for ChatGPT.
Prerequisites for querying GA4 data in BigQuery without understanding SQL
#1 You need a Google Cloud Platform account with billing enabled. To enable the billing, you need a valid credit card.
#2 You need a project to store GA4 data. If you don’t have one, then create a new project in Google Cloud Console to store GA4 data.
#3 Your GA4 property must be connected to your BigQuery project.
#4 You need to have at least a couple of days of data in the events_data table. Otherwise, you won’t be able to query GA4 data in BigQuery:
#5 You must be familiar with the BigQuery User Interface and GA4BigQuery export schema.
Note: If you are brand new to BigQuery, then read this article first: Google Analytics BigQuery Tutorial
#6 You must have access to ChatGPT‘s paid version (‘Plus’ or ‘Team’) as custom GPTs like ‘GA4 BigQuery Composer’ are available only in the paid versions.
Now, let’s move to the best part, i.e. generating SQL via ‘GA4 BigQuery Composer’.
Example-1
Consider the following data table:
This data table is for March 15, 2023 which is denoted by 20230315
The table id for this data table is: dbrt-ga4.analytics_207472454.events_20230315
Let us suppose you want to see only the ‘event_name’ and ‘event_date’ columns in the table.
Let us also suppose you want to see only the first 10 rows.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which shows only the ‘event_name’ and ‘event_date’ columns of the data table and show only the top 10 rows for March 15, 2023.
Provide your table ID when asked.
You should now see the required SQL:
Copy-paste the SQL code into the SQL editor (of your BigQuery project) and then click on the ‘Run’ button:
You should now see the query results in the window below:
From the screenshot above, we can conclude that our data table contains 10 rows and the following two columns: ‘event_name’ and ‘event_date’. This proves that our SQL was correct.
Example-2:
Consider the following data table:
You want to query the data which meets the following requirements:
- You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
- You want to see all the columns of the data table.
- You want to see only those rows where the ‘event_name’ is ‘page_view’
Use the following text prompt in ‘GA4 BigQuery Composer’
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Shows all the columns of the data table.
3) Shows only those rows where the ‘event_name’ is ‘page_view’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
From the screenshot above, we can conclude that our data table contains all columns but only those rows where the ‘event_name’ is ‘page_view.’ This proves that our SQL was correct.
Example-3:
You want to query the data which meets the following requirements:
- You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
- You want to see all the columns of the data table.
- You want to see only those rows where the ‘event_name’ is ‘page_view’
- You want to see only the first 10 rows.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Shows all the columns of the data table.
3) Shows only those rows where the ‘event_name’ is ‘page_view’
4) Show only the first 10 rows.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
From the screenshot above, we can conclude that our data table contains all columns, but only those rows where the ‘event_name’ is ‘page_view’, and it contains only 10 rows.
This proves that our SQL was correct.
Example-4:
You want to query the data which meets the following requirements:
- You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
- You want to see only the ‘event_name’ column
- You want to see only those rows where the ‘event_name’ is ‘session_start’.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show only the ‘event_name’ column of the data table.
3) Shows only those rows where the ‘event_name’ is ‘session_start’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-5:
You want to query the data which meets the following requirements:
- You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
- You want to see only the ‘event_name’ column
- You want to see only those rows where the ‘event_name’ is ‘session_start’
- You want to see ‘event_name’ column as ‘count of session_start’
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show only the ‘event_name’ column of the data table.
3) Shows only those rows where the ‘event_name’ is ‘session_start’
4) Show the value of ‘event_name’ column as ‘count of session_start’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
The provided SQL query retrieves data specifically for sessions that started on March 15, 2023.
Example-6:
You want to query the data which meets the following requirements:
- You want to query data between the date range March 17, 2023 and March 19, 2023.
- You want to see only the ‘event_date’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data between the date range March 17, 2023 and March 19, 2023.
2) Show only the ‘event_date’ column of the data table.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-7:
You want to query the data which meets the following requirements:
- You want to query data between the date range March 17, 2023 and March 19, 2023
- You want to see only the ‘event_date’ column.
- You want to sort the ‘event_date’ column in ascending order.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data between the date range March 17, 2023 and March 19, 2023.
2) Show only the ‘event_date’ column of the data table.
3) Sort the ‘event_date’ column in ascending order.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-8:
Consider the following data table:
You want to query the data which meets the following requirements:
- You want to query data for March 15, 2023
- You want to see the ‘event_name’ column
- You want to see the value of ‘page_location’ event parameter.
- You want to see the value of ‘page_location’ as ‘page_location’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table.
3) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
Pay attention to the last two line of the text prompt:
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
It is a good practice to let ‘GA4 BigQuery Composer’ know how to retrieve the value of the ‘page_location’ event parameter.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-9:
You want to query the data which meets the following requirements:
- You want to query data for March 15, 2023
- You want to see the ‘event_name’ column where ‘event_name’ is ‘first_visit’
- You want to see the value of ‘page_title’ event parameter.
- You want to see the value of ‘page_title’ as ‘page_title’ column.
- You want to see the value of ‘page_location’ event parameter.
- You want to see the value of ‘page_location’ as ‘page_location’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table where the value of ‘event_name’ is ‘first_visit’
3) Show the value of ‘page_title’ event parameter as ‘page_title’ column.
The ‘page_title’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_title’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
4) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-10:
You want to query the data which meets the following requirements:
- You want to query data for March 15, 2023
- You want to see the ‘event_name’ column where ‘event_name’ is ‘first_visit’
- You want to see the value of ‘page_title’ event parameter.
- You want to see the value of ‘page_title’ as ‘page_title’ column.
- You want to see the value of ‘page_location’ event parameter.
- You want to see the value of ‘page_location’ as ‘page_location’ column.
- You want to see the value of ‘engaged_session_event’ event parameter.
- You want to see the value of ‘engaged_session_event’ as ‘engaged_session_event’ column.
- You want to see the value of ‘engaged_session_event’ column where ‘engaged_session_event’ is 1
Use the following text prompt in ChatGPT:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table where the value of ‘event_name’ is ‘first_visit’
3) Show the value of ‘page_title’ event parameter as ‘page_title’ column.
The ‘page_title’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_title’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
4) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
5) Show the value of ‘engaged_session_event’ event parameter as ‘engaged_session_event’ column.
The ‘engaged_session_event’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘engaged_session_event’ event parameter is obtained by selecting ‘event_params.value.int_value’ field.
6) Show only those values of ‘engaged_session_event’ event parameter where ‘engaged_session_event’ is equal to 1.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Other articles on GA4 BigQuery
#1 BigQuery Introduction
- How to create a new Google Cloud Platform account.
- How to create a new BigQuery project.
- What is Google BigQuery Sandbox and how to use it.
- Understanding the BigQuery User Interface.
- What is BigQuery Data Transfer Service & how it works.
- How to create data transfer in BigQuery.
- Connect and transfer data from Google Sheets to BigQuery.
- How to access BigQuery Public Data Sets.
- Best Supermetrics Alternative – Dataddo.
#2 GA4 BigQuery Introduction
- Google Analytics 4 BigQuery Tutorial for Beginners to Advanced.
- GA4 Bigquery Export Schema Tutorial.
- GA4 BigQuery – Connect Google Analytics 4 with BigQuery.
- events_ & events_intraday_ tables in BigQuery for GA4 (Google Analytics 4).
- pseudonymous_users_ & users_ data tables in BigQuery for GA4 (Google Analytics 4).
- How to access GA4 Sample Data in BigQuery.
- Advantages of using Google BigQuery for Google Analytics 4.
- Impact of Google Advanced Consent Mode on BigQuery & GDPR.
#3 GA4 BigQuery Data Transfer
- How to Connect and Export Data from GA4 to BigQuery
- How to backfill GA4 data in BigQuery.
- How to overcome GA4 BigQuery Export limit.
- How to Send Custom GA4 Data to BigQuery.
- How to backup Universal Analytics data to BigQuery.
- How to send data from Google Ads to BigQuery.
- How to send data from Google Search Console to BigQuery.
- Sending data from Google Analytics to BigQuery without 360.
- How to send data from Facebook ads to BigQuery.
- How to pull custom data from Google Analytics to BigQuery.
#4 BigQuery Cost Optimization
- Guide to BigQuery Cost Optimization.
- Using Google Cloud pricing calculator for BigQuery.
- Cost of using BigQuery for Google Analytics 4.
#5 Query GA4 BigQuery Data
- How to query Google Analytics data in BigQuery.
- Query GA4 data in BigQuery without understanding SQL.
- Using GA4 BigQuery SQL generator to create SQL queries.
- New vs Returning users in GA4 BigQuery data table.
- GA4 BigQuery Composer Tutorial for ChatGPT.
- How to track GA4 BigQuery Schema Change.
- Calculating Sessions and Engaged Sessions in GA4 BigQuery.
- Calculating Total Users in GA4 BigQuery.
#6 GA4 to BigQuery Dimension/Metric Mapping.
- GA4 to BigQuery Mapping Tutorial.
- GA4 Attribution Dimensions to BigQuery Mapping.
- GA4 Google Ads Dimensions to BigQuery Mapping.
- GA4 Demographic Dimensions to BigQuery Mapping.
- GA4 Ecommerce Dimensions to BigQuery Mapping.
- GA4 Event-Scoped Ecommerce Metrics to BigQuery Mapping.
- GA4 Item-Scoped Ecommerce Metrics to BigQuery Mapping.
- GA4 Revenue Metrics to BigQuery Mapping.
- GA4 Event Dimensions to BigQuery Mapping.
- GA4 Event Metrics to BigQuery Mapping.
- GA4 Geography Dimensions to BigQuery Mapping.
- GA4 Link Dimensions to BigQuery Mapping.
- GA4 Page/Screen Dimensions to BigQuery Mapping.
- GA4 Page/Screen Metrics to BigQuery Mapping.
- GA4 Platform/Device Dimensions to BigQuery Mapping.
- GA4 User-Scoped Traffic Dimensions to BigQuery Mapping.
- GA4 Session-Scoped Traffic Dimensions to BigQuery Mapping.
- GA4 Session Metrics to BigQuery Mapping.
- GA4 User Dimensions to BigQuery Mapping.
- GA4 User Metrics to BigQuery Mapping.
- GA4 Advertising Metrics to BigQuery Mapping.
-
In this article, I will show you how to query GA4 data in BigQuery without understanding a single line of SQL code.
I have been using GA4 with BigQuery for over 1100 days (i.e. over 3 years) now.
I also teach BigQuery.
A lot of people don’t use BigQuery because they need to learn SQL first, and SQL is hard.
My goal is to make BigQuery accessible to all and make the knowledge of SQL irrelevant.
What is BigQuery?
BigQuery is a data warehouse from Google that is used to provide business intelligence through reports and dashboards.
It is a data storage and management system used to bring data from several data sources (like Google Analytics, Google Ads, Facebook, etc.) for the purpose of reporting and analysis.
BigQuery is one of the products of the Google Cloud platform.
Therefore, in order to use BigQuery, you will need a Google Cloud Platform account.
Why should you use GA4 with BigQuery?
To benefit from GA4 the most, learn and master BigQuery.
When you use GA4 with BigQuery, you can:
- Save your user-specific data from being deleted.
- You can access unsampled raw events and user-level data.
- Your data is not subject to cardinality limits.
- You actually own your data.
- You can do advanced data manipulation.
- You can integrate GA4 data with other data sources.
- You can work retroactively on GA4 data.
- BigQuery lets you easily filter out or modify incorrect GA4 data.
#1 Save your user-specific data from being deleted.
Most people don’t know, but they lose user-specific data for inactive website users in GA4 every 2 or 14 months based on their current data retention settings:
To avoid this data loss, connect your GA4 property to BigQuery and start collecting GA4 data there.
#2 You can access unsampled raw events and user-level data.
In GA4, the raw events and user-level data are only available via the explorations reports, but they are subject to data sampling.
On the other hand, BigQuery data tables do not suffer from data sampling issues.
You get access to unsampled raw events and user-level data when you use BigQuery with GA4.
#3 Your data is not subject to cardinality limits.
When you access GA4 data via the reporting interface or the data API, it can suffer from cardinality issues.
When using GA4 reports, you should avoid cardinality wherever possible. That’s how you can remove the (other) row from appearing in your GA4 reports.
Cardinality refers to the number of unique values in a data set.
A data set is a set of observed values for a particular variable.
For example,
Consider the following data set: {20, 15, 61, 8}
This data set has four unique values. Therefore the cardinality of this data set is four.
The data set with many unique values is called a high cardinality data set.
Google defines high cardinality dimensions as dimensions with more than 500 unique values in one day.
So if GA4 reports more than 500 unique values for the ‘Item name’ dimension in one day, then it would be considered as a high cardinality dimension.
When you use a high cardinality dimension in a GA4 report, it could result in some (or a lot of data) reported under the (other) row:
#4 You actually own your data.
When you use BigQuery to collect and store GA4 data, you get complete control and ownership of your data.
This means you can choose how long to keep your data, who has access to it, how it is used, etc.
When you use GA4 to collect and store your data, your data is stored on Google’s servers and is subject to Google’s data retention policies.
#5 You can do advanced data manipulation.
When you use BigQuery, you can manipulate GA4 data in ways that are many times not possible when using the GA4 user interface or the data API.
One of the biggest advantages of using BigQuery is that it does not have the same limitations as the GA4 user interface and API regarding which dimensions and metrics can be queried together.
That means you can perform more advanced data segmentation and analysis when using GA4 with BigQuery.
When you use GA4 with BigQuery, you get access to a wide range of tools and features that are not available in the GA4 user interface or the data API.
For example,
With BigQuery, you can create custom queries to extract specific data, join data from multiple data sources, and perform complex calculations and aggregations.
BigQuery also provides machine learning capabilities that can help you forecast future trends and gain deeper insights.
As a result of these capabilities, many companies and web analysts use BigQuery as their primary tool for querying analytics data.
They may use the GA4 user interface and API for basic queries and reporting but rely mainly on BigQuery for more advanced analysis and insights.
#6 You can integrate GA4 data with other data sources.
Since BigQuery is a data warehouse, it allows you to import data from multiple data sources and then combine and correlate them with GA4 data.
These data sources could include (but are not limited to) CRMs, shopping carts, marketing automation platforms, etc.
You can easily integrate BigQuery with data visualisation tools like Looker Studio and can thus easily visualise GA4 data.
Integrating GA4 data with other data sources can provide several key advantages, such as a better understanding of the customer’s purchase journey, better decision-making, and improved marketing effectiveness.
#7 You can work retroactively on GA4 data
When you use the GA4 user interface or the data API, the conversions and filters do not work retroactively and are collected and reported only from when you first set up your tracking.
That means that if you want to calculate conversions based on historical data, it is not possible with the GA4 user interface or data API.
However, when you use BigQuery to store your GA4 data, you can work retroactively on your data and perform calculations and analysis based on historical data.
#8 BigQuery lets you easily filter out or modify incorrect GA4 data.
When you use GA4 with BigQuery, you can easily filter out or modify incorrect GA4 data from your analysis and reports.
This can help ensure that your data analysis and reports are based on accurate data and can lead to better decision-making.
In contrast, when you use the GA4 user interface or data API to query data, filtering out or modifying incorrect data can be more challenging.
This is because the GA4 user interface and data API do not provide SQL querying capabilities, which makes it harder to manipulate the data to filter out or modify incorrect data.
Why are people reluctant to use BigQuery?
Despite several advantages of using GA4 with BigQuery, many people are still reluctant to use BigQuery and rely on either the GA4 user interface or the data API.
They are two main reasons why people are reluctant to use BigQuery:
#1. They think it is crazy expensive to use, but it is not (for the majority of companies).
#2 They need to know SQL really well.
Related Articles:
What is SQL?
BigQuery is of little use to you if you can’t query the data.
But to query the data in BigQuery, you need to know SQL.
SQL (Structured Query Language) is a programming language which is used to store, access and manipulate data in a database like BigQuery.
Why should you use natural language to generate SQL?
SQL has a complex syntax and structure.
You will need to learn and memorise many keywords, operators, and functions. And understanding how to combine them can be challenging.
SQL queries can become progressively harder to understand and debug as they become more complex.
Even small syntax errors or logical mistakes can result in incorrect results.
Complex queries often involve nested subqueries, multiple joins, and complex logic, which can be difficult to follow.
All of this can make SQL hard to learn and use, especially for absolute beginners.
With the advent of AI-powered tools like GA4 BigQuery Composer, it is now possible to convert natural language into SQL queries.
Natural language refers to the language humans use daily to communicate, such as English, French, Spanish, etc.
We use the words ‘natural language’ mainly in the context of machine languages (also known as artificial languages or programming languages).
To convert natural language into SQL query, you give instructions to ‘GA4 BigQuery Composer’ in plain English, and then the composer converts your text-based instructions into SQL for BigQuery.
For example:
What is GA4 BigQuery Composer?
GA4 BigQuery Composer is a ChatGPT plugin for creating efficient, precise SQL queries for GA4 BigQuery.
What GA4 BigQuery Composer can do that other tools can not do?
#1 Through ‘GA4 BigQuery composer’ tool, you can write SQL queries for Google BigQuery even when you don’t understand a single line of code. The Composer will automatically convert your instructions in plain English into SQL code.
For example:
#2 Unlike the regular chatgpt, the ‘GA4 BigQuery composer’ is specially trained on GA4 BigQuery data sets. So, its results are likely to be much more accurate.
#3 Through ‘GA4 BigQuery composer’, you can generate even complex SQL queries within seconds.
#4 The composer allows you to troubleshoot SQL even when you don’t understand a single line of code. Just copy-paste the error message into the Composer tool, and it will rewrite the SQL for you.
For example:
#5 If you are not satisfied with the SQL generated by ‘Composer’ (maybe you want to rename a particular column, add a new field, or sort data by a particular column), it can re-write the SQL for you within seconds.
#6 When you use ‘GA4 BigQuery composer’, its usage drastically reduces your chances of making errors in query syntax or structure while writing SQL queries.
#7 By automating the query-building process, the Composer saves tons of your time.
To learn more about the ‘GA4 BigQuery composer’ and how it works, check out this article: GA4 BigQuery Composer Tutorial for ChatGPT.
Prerequisites for querying GA4 data in BigQuery without understanding SQL
#1 You need a Google Cloud Platform account with billing enabled. To enable the billing, you need a valid credit card.
#2 You need a project to store GA4 data. If you don’t have one, then create a new project in Google Cloud Console to store GA4 data.
#3 Your GA4 property must be connected to your BigQuery project.
#4 You need to have at least a couple of days of data in the events_data table. Otherwise, you won’t be able to query GA4 data in BigQuery:
#5 You must be familiar with the BigQuery User Interface and GA4BigQuery export schema.
Note: If you are brand new to BigQuery, then read this article first: Google Analytics BigQuery Tutorial
#6 You must have access to ChatGPT‘s paid version (‘Plus’ or ‘Team’) as custom GPTs like ‘GA4 BigQuery Composer’ are available only in the paid versions.
Now, let’s move to the best part, i.e. generating SQL via ‘GA4 BigQuery Composer’.
Example-1
Consider the following data table:
This data table is for March 15, 2023 which is denoted by 20230315
The table id for this data table is: dbrt-ga4.analytics_207472454.events_20230315
Let us suppose you want to see only the ‘event_name’ and ‘event_date’ columns in the table.
Let us also suppose you want to see only the first 10 rows.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which shows only the ‘event_name’ and ‘event_date’ columns of the data table and show only the top 10 rows for March 15, 2023.
Provide your table ID when asked.
You should now see the required SQL:
Copy-paste the SQL code into the SQL editor (of your BigQuery project) and then click on the ‘Run’ button:
You should now see the query results in the window below:
From the screenshot above, we can conclude that our data table contains 10 rows and the following two columns: ‘event_name’ and ‘event_date’. This proves that our SQL was correct.
Example-2:
Consider the following data table:
You want to query the data which meets the following requirements:
- You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
- You want to see all the columns of the data table.
- You want to see only those rows where the ‘event_name’ is ‘page_view’
Use the following text prompt in ‘GA4 BigQuery Composer’
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Shows all the columns of the data table.
3) Shows only those rows where the ‘event_name’ is ‘page_view’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
From the screenshot above, we can conclude that our data table contains all columns but only those rows where the ‘event_name’ is ‘page_view.’ This proves that our SQL was correct.
Example-3:
You want to query the data which meets the following requirements:
- You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
- You want to see all the columns of the data table.
- You want to see only those rows where the ‘event_name’ is ‘page_view’
- You want to see only the first 10 rows.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Shows all the columns of the data table.
3) Shows only those rows where the ‘event_name’ is ‘page_view’
4) Show only the first 10 rows.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
From the screenshot above, we can conclude that our data table contains all columns, but only those rows where the ‘event_name’ is ‘page_view’, and it contains only 10 rows.
This proves that our SQL was correct.
Example-4:
You want to query the data which meets the following requirements:
- You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
- You want to see only the ‘event_name’ column
- You want to see only those rows where the ‘event_name’ is ‘session_start’.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show only the ‘event_name’ column of the data table.
3) Shows only those rows where the ‘event_name’ is ‘session_start’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-5:
You want to query the data which meets the following requirements:
- You want to query data from the data table `dbrt-ga4.analytics_207472454.events_20230315`
- You want to see only the ‘event_name’ column
- You want to see only those rows where the ‘event_name’ is ‘session_start’
- You want to see ‘event_name’ column as ‘count of session_start’
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show only the ‘event_name’ column of the data table.
3) Shows only those rows where the ‘event_name’ is ‘session_start’
4) Show the value of ‘event_name’ column as ‘count of session_start’
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
The provided SQL query retrieves data specifically for sessions that started on March 15, 2023.
Example-6:
You want to query the data which meets the following requirements:
- You want to query data between the date range March 17, 2023 and March 19, 2023.
- You want to see only the ‘event_date’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data between the date range March 17, 2023 and March 19, 2023.
2) Show only the ‘event_date’ column of the data table.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-7:
You want to query the data which meets the following requirements:
- You want to query data between the date range March 17, 2023 and March 19, 2023
- You want to see only the ‘event_date’ column.
- You want to sort the ‘event_date’ column in ascending order.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data between the date range March 17, 2023 and March 19, 2023.
2) Show only the ‘event_date’ column of the data table.
3) Sort the ‘event_date’ column in ascending order.
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-8:
Consider the following data table:
You want to query the data which meets the following requirements:
- You want to query data for March 15, 2023
- You want to see the ‘event_name’ column
- You want to see the value of ‘page_location’ event parameter.
- You want to see the value of ‘page_location’ as ‘page_location’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table.
3) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
Pay attention to the last two line of the text prompt:
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
It is a good practice to let ‘GA4 BigQuery Composer’ know how to retrieve the value of the ‘page_location’ event parameter.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-9:
You want to query the data which meets the following requirements:
- You want to query data for March 15, 2023
- You want to see the ‘event_name’ column where ‘event_name’ is ‘first_visit’
- You want to see the value of ‘page_title’ event parameter.
- You want to see the value of ‘page_title’ as ‘page_title’ column.
- You want to see the value of ‘page_location’ event parameter.
- You want to see the value of ‘page_location’ as ‘page_location’ column.
Use the following text prompt in ‘GA4 BigQuery Composer’:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table where the value of ‘event_name’ is ‘first_visit’
3) Show the value of ‘page_title’ event parameter as ‘page_title’ column.
The ‘page_title’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_title’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
4) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Example-10:
You want to query the data which meets the following requirements:
- You want to query data for March 15, 2023
- You want to see the ‘event_name’ column where ‘event_name’ is ‘first_visit’
- You want to see the value of ‘page_title’ event parameter.
- You want to see the value of ‘page_title’ as ‘page_title’ column.
- You want to see the value of ‘page_location’ event parameter.
- You want to see the value of ‘page_location’ as ‘page_location’ column.
- You want to see the value of ‘engaged_session_event’ event parameter.
- You want to see the value of ‘engaged_session_event’ as ‘engaged_session_event’ column.
- You want to see the value of ‘engaged_session_event’ column where ‘engaged_session_event’ is 1
Use the following text prompt in ChatGPT:
Write SQL which meets the following requirements:
1) Shows data for March 15, 2023
2) Show the ‘event_name’ column of the data table where the value of ‘event_name’ is ‘first_visit’
3) Show the value of ‘page_title’ event parameter as ‘page_title’ column.
The ‘page_title’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_title’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
4) Show the value of ‘page_location’ event parameter as ‘page_location’ column.
The ‘page_location’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘page_location’ event parameter is obtained by selecting ‘event_params.value.string_value’ field.
5) Show the value of ‘engaged_session_event’ event parameter as ‘engaged_session_event’ column.
The ‘engaged_session_event’ event parameter is a nested field (array of structures) within the ‘events_parms’ nested field.
The value of the ‘engaged_session_event’ event parameter is obtained by selecting ‘event_params.value.int_value’ field.
6) Show only those values of ‘engaged_session_event’ event parameter where ‘engaged_session_event’ is equal to 1.
GA4 BigQuery Composer should produce the following SQL:
Copy-paste this code into the SQL editor and then click on the ‘Run’ button:
Other articles on GA4 BigQuery
#1 BigQuery Introduction
- How to create a new Google Cloud Platform account.
- How to create a new BigQuery project.
- What is Google BigQuery Sandbox and how to use it.
- Understanding the BigQuery User Interface.
- What is BigQuery Data Transfer Service & how it works.
- How to create data transfer in BigQuery.
- Connect and transfer data from Google Sheets to BigQuery.
- How to access BigQuery Public Data Sets.
- Best Supermetrics Alternative – Dataddo.
#2 GA4 BigQuery Introduction
- Google Analytics 4 BigQuery Tutorial for Beginners to Advanced.
- GA4 Bigquery Export Schema Tutorial.
- GA4 BigQuery – Connect Google Analytics 4 with BigQuery.
- events_ & events_intraday_ tables in BigQuery for GA4 (Google Analytics 4).
- pseudonymous_users_ & users_ data tables in BigQuery for GA4 (Google Analytics 4).
- How to access GA4 Sample Data in BigQuery.
- Advantages of using Google BigQuery for Google Analytics 4.
- Impact of Google Advanced Consent Mode on BigQuery & GDPR.
#3 GA4 BigQuery Data Transfer
- How to Connect and Export Data from GA4 to BigQuery
- How to backfill GA4 data in BigQuery.
- How to overcome GA4 BigQuery Export limit.
- How to Send Custom GA4 Data to BigQuery.
- How to backup Universal Analytics data to BigQuery.
- How to send data from Google Ads to BigQuery.
- How to send data from Google Search Console to BigQuery.
- Sending data from Google Analytics to BigQuery without 360.
- How to send data from Facebook ads to BigQuery.
- How to pull custom data from Google Analytics to BigQuery.
#4 BigQuery Cost Optimization
- Guide to BigQuery Cost Optimization.
- Using Google Cloud pricing calculator for BigQuery.
- Cost of using BigQuery for Google Analytics 4.
#5 Query GA4 BigQuery Data
- How to query Google Analytics data in BigQuery.
- Query GA4 data in BigQuery without understanding SQL.
- Using GA4 BigQuery SQL generator to create SQL queries.
- New vs Returning users in GA4 BigQuery data table.
- GA4 BigQuery Composer Tutorial for ChatGPT.
- How to track GA4 BigQuery Schema Change.
- Calculating Sessions and Engaged Sessions in GA4 BigQuery.
- Calculating Total Users in GA4 BigQuery.
#6 GA4 to BigQuery Dimension/Metric Mapping.
- GA4 to BigQuery Mapping Tutorial.
- GA4 Attribution Dimensions to BigQuery Mapping.
- GA4 Google Ads Dimensions to BigQuery Mapping.
- GA4 Demographic Dimensions to BigQuery Mapping.
- GA4 Ecommerce Dimensions to BigQuery Mapping.
- GA4 Event-Scoped Ecommerce Metrics to BigQuery Mapping.
- GA4 Item-Scoped Ecommerce Metrics to BigQuery Mapping.
- GA4 Revenue Metrics to BigQuery Mapping.
- GA4 Event Dimensions to BigQuery Mapping.
- GA4 Event Metrics to BigQuery Mapping.
- GA4 Geography Dimensions to BigQuery Mapping.
- GA4 Link Dimensions to BigQuery Mapping.
- GA4 Page/Screen Dimensions to BigQuery Mapping.
- GA4 Page/Screen Metrics to BigQuery Mapping.
- GA4 Platform/Device Dimensions to BigQuery Mapping.
- GA4 User-Scoped Traffic Dimensions to BigQuery Mapping.
- GA4 Session-Scoped Traffic Dimensions to BigQuery Mapping.
- GA4 Session Metrics to BigQuery Mapping.
- GA4 User Dimensions to BigQuery Mapping.
- GA4 User Metrics to BigQuery Mapping.
- GA4 Advertising Metrics to BigQuery Mapping.
My best selling books on Digital Analytics and Conversion Optimization
Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.
Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.
Attribution Modelling in Google Analytics and BeyondSECOND EDITION OUT NOW!
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.
Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.