GA4 BigQuery Export Schema Tutorial

Once you understand how the GA4 data is stored in BigQuery data tables, it will become easier for you to query it.

What is the GA4 BigQuery Export Schema?

The GA4 BigQuery export schema refers to the structure of GA4 and Firebase data that is exported to a Google BigQuery project.

This schema defines how the data is organized within datasets and data tables.

To understand the GA4 BigQuery export schema, you will first need to understand the basic structure of how data is stored in a BigQuery project.

  • Google Cloud Platform (GCP) can consist of one or more organizations.
  • Each organization can consist of one or more projects.
  • One such project can be a BigQuery project. It’s the environment within which all BigQuery datasets and operations reside.
  • Each project has a name, project number, and project ID.
  • Each project can consist of one or more datasets. 
  • Each dataset can consist of one or more data tables.

Project ID and Data Sets

Get weekly practical tips on GA4 and/or BigQuery to accurately track and read your analytics data.

 

When you click on a project ID, you can see the datasets:

project ID
datasets

Here, the project id ‘dbrt-ga4’ contains the following data sets:

  1. analytics_207472454
  2. custom_ga4
  3. ga3_data_backup
  4. google_ads
  5. historical_ga4_data
  6. searchconsole

For every GA4 property and each Firebase project connected to your BigQuery project, a unique dataset called “analytics_<property_id>” is created in your BigQuery project.

The “property_id” is your GA4 Property ID. 

The dataset named in the following format: “analytics_<property_id>” is the dataset meant for GA4:

dataset for storing ga4 data in bigquery

Click on the dataset named in the following format: “analytics_<property_id>” .

You should now be able to see the data set information like ‘Dataset ID’, Dataset creation date, Last modification date, Data location etc:

dataset for storing ga4 data in bigquery info

Note down the Dataset ID by copying it to the clipboard:

Note down the Dataset ID 1

We are going to reference this dataset ID later when querying the data.

Click on the ‘analytics_’ dataset again. 

You should now see the following three to four data tables:

data tables for storing ga4 data in bigquery

Each dataset is made up of one or more data tables.

The ‘analytics_207472454’ dataset contains the following four data tables:

  1. events_(<number of days>)
  2. events_intraday_<number of days>
  3. pseudonymous_users_<number of days>
  4. users_<number of days>

‘events_’ and ‘events_intraday_’ Data Tables

The ‘events_’ and ‘events_intraday_’ data tables contain event-based and user-based GA4 export data in BigQuery.

All the GA4 event data from the previous day(s) is available in the ‘events_’ data table. 

This table is automatically imported for each day of export.

‘events data table ga4 bigquery

events_(1) means all of the GA4 event data from the previous day are available in this data table.

events_(2) means all of the GA4 event data from the previous two days are available in this data table.

events_(3) means all of the GA4 event data from the previous three days are available in this data table.

Similarly,

events_(1187) means all of the GA4 event data from the previous 1187 days are available in this data table.

All the GA4 event data from the current day is available in the ‘events_intraday_’ data table.

‘events intraday data table ga4 bigquery

This table is automatically updated throughout the day. That’s why it is called ‘events_intraday’ table.

Note: We usually do not query GA4 data from the ‘events_intraday_’ data table.

‘pseudonymous_users_’ and ‘users _’ Data Tables

The ‘pseudonymous_users_’ and ‘users _’ data tables contain only user-based GA4 export data in BigQuery.

The advantage of using the ‘pseudonymous_users_’ and ‘users _’ data tables over the ‘events_’ and ‘events_intraday_’ data tables is that you get access to more user data.

The ‘pseudonymous_users_’ and ‘users _’ data tables contain audience and prediction data which is not available in the ‘events_’ and ‘events_intraday_’ data tables.

The ‘pseudonymous_users_’ data table contains all the data for every pseudonymous identifier that is not user ID.

‘pseudonymous users data table ga4 bigquery

A pseudonymous identifier is a unique identifier (like ‘Google Signals’ or ‘Device ID’) created and assigned to each user by Google Analytics to track users across devices and sessions and to create a complete picture of their behaviour. 

The ‘pseudonymous_users_’ data table is updated whenever data for a user is updated.

pseudonymous_users_(1) means all the data for every pseudonymous identifier that is not a user ID from the previous day is available in this data table.

pseudonymous_users_(2) means all the data for every pseudonymous identifier that is not a user ID from the previous two days is available in this data table.

Similarly, 

pseudonymous_users_(236) means all the data for every pseudonymous identifier that is not a user ID from the previous 236 days is available in this data table.

The ‘users_’ data table contains all the data for every pseudonymous identifier that is a user ID.

‘users data table ga4 bigquery

Data for a user is updated when there is a change to one of the fields.

Note: The ‘users_’ data table is not available to you in your BigQuery project if you are not using the user-id tracking in GA4.

The SCHEMA tab of the data table

Clicking on the ‘events_’ data table will show you the structure of that table (also known as ‘Schema’):

The SCHEMA tab of the GA4 BigQuery data table

The schema shows you how the data table has been set up, what type of values it accepts, etc.

Take a close look at the various fields available under the ‘SCHEMA’ tab: 

ga4 bigquery data table event fields

We are going to reference these fields when querying the GA4 data.

Bookmark the [GA4] BigQuery Export schema help documentation from Google to find more information about each field:

find more information about each field

You can increase or decrease the size of the right-hand side panel by dragging via the mouse:

increase or decrease the size of the right hand side panel

Selecting Data Table based on Date

The ‘events_’ data tables are named as “events_YYYYMMDD” where “YYYYMMDD” refers to the date the table was imported to BigQuery.

YYYY denotes a year. For example, 2024

MM denotes a month. For example, 04 (i.e. April)

DD denotes a day. For example, 07

So the data table that was imported to BigQuery on April 7th, 2024 would be named as events_20240407

events 20240407

So you are looking at the data for April 7th, 2024.

If you want to look at data for a different date, then click on the date drop-down menu and select a different date:

look at data for a different date
look at data for a different date drop down menu

The DETAILS and the PREVIEW tabs of the Data Table

Click on the ‘DETAILS’ tab to get information about the data table:

get information about the data table
details tab ga4 bigquery data table

Take note of the table ID:

copy to clipboard table id bigquery

We are going to reference table ID later when querying the GA4 data.

Look at the ‘Storage Info’ section to determine the size of your data table:

‘Storage Info section ga4 bigquery data table

It is always a best practice to check the size of a table before querying the data from it.

If the size of the data table is just a few kilobytes (KB) or megabytes (MB), you don’t need to worry.

But if the table size is in gigabytes (GB), terabytes (TB) or petabytes (PB), you should be careful how you query your data.

Your monthly cost of using BigQuery depends upon the following factors:

#1 The amount of data you stored in BigQuery (i.e. the storage cost)

#2 The amount of data you processed by each query you run (i.e. the query cost).

The first 10 GB of active storage is free each month. After that, you would be charged $0.020 per GB of active storage.

The first 1 terabyte of data processed is free each month. After that, you would be charged $5 per terabyte (TB) of data processed.

Related Articles:

Click on the ‘Preview’ tab to view the data in the ‘events_’ data table:

preview the data in the ‘events data table
preview data in GA4 bigquery ‘events data table

It is always a best practice to preview a table before querying the data from it.

Many people, especially new users, run queries just to preview the data in a data table. This could considerably cost you if you accidentally queried gigabytes or terabytes of data. 

Instead of running queries just to preview the data in a data table, click on the ‘Preview’ tab to preview the table.

There is no cost for previewing the data table. 

The table preview will give you an idea of what type of data is available in the table without querying the table.

Rows and Columns of the Data Table

From the table preview, you can see that the table is made up of rows and columns:

ga4 bigquery data table is made up of rows and columns

Use the horizontal slider to see more columns:

Use the horizontal slider to see more columns in ga4 bigquery data table

Use the vertical slider to see more rows:

Use the vertical slider to see more rows in ga4 bigquery data table

Use the ‘Results per page’ drop-down menu if you want to see more than 50 rows:

the ‘Results per page drop down menu

Note: You can see up to 200 rows per page.

200 rows per page ga4 bigquery data table

To see the next 200 rows, press the > button:

the next 200 rows

You can re-size the width of each column in the data table.

re size the width of each column in the data table

How the GA4 data is stored in data tables

Each row of the data table corresponds to a single GA4 event. 

For example,

The first row corresponds to the ‘first_visit’ event:

first row ga4 bigquery data table

The second row corresponds to the ‘session_start’ event:

second row ga4 bigquery data table

Each column of the data table corresponds to a single key of the GA4 event parameter.

Each column of the data table corresponds to a single key of the GA4 event parameter

Event parameters in GA4 are the additional information about an event that are sent along with the event. The key of an event parameter is the name of the parameter, and the value of an event parameter is the data associated with the parameter.

key and value of the ga4 event parameter

Here,

‘event_name’ is the key of the event parameter, and ‘first-visit’ is one of the values of the event parameter.

Keys are also known as fields.

Keys are also known as fields

Fields can be regular or nested.

Fields can be regular or nested

A regular field is a single piece of data, such as a string, a number, or a date. A nested field is a collection of fields (often an array of structures) that are stored together as a single unit.

The event_params field is an array of structures that contains additional event parameters.

event params field is an array of structures

Here,

ga_session_id‘ is an event parameter. 

engaged_session_event‘ is an event parameter.

Similalry,

page_title‘ is an event parameter.

There can be nested fields within the nested field.

Nested fields can be nested within other nested fields to any depth. This is often referred to as hierarchical data. 

Hierarchical data is a type of data that is organized in a tree-like structure. Each node in the tree can contain other nodes, and so on.

For example, 

The ‘page_referrer’ is a nested field within the ‘event_params’ nested field. 

This means that the ‘page_referrer’ event parameter is itself an array of structures. Each structure in the array represents a single referrer.


The ‘page_location’ event parameter is a nested field within the ‘event_params’ nested field. 

This means that the page_location event parameter is itself an array of structures. Each structure in the array represents a single location.

nested fields within the nested field

Each event has information on event-specific parameters.  

Each event has information on event specific parameters 1

The information about GA4 event parameters is stored in the data table in the key-value format:

The information about GA4 event parameters 1

The key field (event_params.key) denotes the name of the event parameter.

For example: ‘page_title’:

page title

A value field is an object containing the event parameter’s value in one of its four fields: 

  1. string_value
  2. int_value
  3. float_value
  4. double_value.

So we can have the following value fields:

  1. event_params.value.string_value
  2. event_params.value.int_value
  3. event_params.value.float_value
  4. event_params.value.double_value
A value field is an object

The event_params.value.string_value field in GA4 is a nested field that stores the string value of an event parameter.

The event_params.value.int_value field in GA4 is a nested field that stores the integer value of an event parameter.

The event_params.value.float_value field in GA4 is a nested field that stores the float value of an event parameter.

The event_params.value.double_value field in GA4 is a nested field that stores the double value of an event parameter.

The specific field that is used to store the value of the event parameter depends on the type of value that the parameter is.

For example, if the event parameter is a string, the event_params.value.string_value field will be used to store the value of the parameter. 

Similarly, if the event parameter is an integer, the event_params.value.int_value field will be used to store the value of the parameter. And so on.

The following table lists the most common event parameters in the GA4 BigQuery data table:

the most common event parameters in the GA4 BigQuery data table

My best selling books on Digital Analytics and Conversion Optimization

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution Modelling in Google Analytics and BeyondSECOND EDITION OUT NOW!
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.

About the Author

Himanshu Sharma

  • Founder, OptimizeSmart.com
  • Over 15 years of experience in digital analytics and marketing
  • Author of four best-selling books on digital analytics and conversion optimization
  • Nominated for Digital Analytics Association Awards for Excellence
  • Runs one of the most popular blogs in the world on digital analytics
  • Consultant to countless small and big businesses over the decade