GA4 Data Redaction – Remove PII from Google Analytics 4
Introduction to PII
PII stands for Personally Identifiable Information. It is a subset of personal data that can be used to uniquely identify an individual.
Any information that reveals an individual’s identity, contact or location is a PII.
Examples of PII
- email addresses
- phone numbers
- social security numbers
- names of the users
- precise location of users, etc.
Get weekly practical tips on GA4 and/or BigQuery to accurately track and read your analytics data.
Rules regarding sending PII data
It is against Google Analytics Terms of Service to send PII data to the Google Analytics server.
If you are found to collect PII in GA, then you may end up losing your GA account for good.
Google has implemented strict data privacy measures in GA4 to protect user data. It automatically masks any PII data if it detects it.
However, sometimes, PII data can still slip into your GA4 reports.
Finding PII data in GA4 reports
At least once a week, go through all of your reports in your GA4 property to find any accidental leak of PII data.
PII data can often appear as an event name, event parameter, page URL or page title.
PII data can also be passed through user id, measurement protocol or data import.
For example, you can search for ‘@’ in the ‘Pages and Screens’ report to check if any email addresses have been captured.
The results will show up If there are any email addresses captured in the report.
Similarly, you can also run the same validation in the ‘Events’ reports as well.
Below are some of the regex formats that you can use to identify the PII data in GA4 reports:
To identify full email ID formats:
([a-zA-Z0-9_\.-]+)@([\da-zA-Z\.-]+)\.([a-zA-Z\.]{2,6})
To identify social security numbers:
(\d{3}-?\d{2}-?\d{4})
To identify addresses:
(drive|street|road|dr.|po box|rd.)
To identify phone numbers:
(\d{3}-?\d{3}-?\d{4})
To identify names:
(fn|ln|lastname|firstname|name|fullname)
Methods to remove PII from Google Analytics 4 (GA4)
Use the following methods to remove PII data from your GA4 property:
1. Use the GA4 API to delete PII data from specific reports or data streams programmatically.
2. Use the user deletion API to delete user data by specifying the client ID, user ID or app instance ID.
3. Use the data deletion requests feature in GA4 to delete PII data based on specific criteria, such as event name and event parameter.
4. Use the data retention settings in GA4 to delete PII data after a certain period automatically.
5. Use the GA4 Data Redaction feature to prevent the inadvertent collection and storage of PII data within GA4 data streams.
6. Use Google Tag Manager to prevent PII from being sent to GA4 in the first place.
What is GA4 data redaction?
Data redaction is a feature of Google Analytics 4 (GA4) to remove Personal Identifiable Information (PII) (like name, email address, etc) from the data collected by website events before they are sent to Google Analytics Servers. Redacted values will appear as “(redacted)” in your GA4 reports.
Note: The data redaction feature only applies to web data streams.
Why redacting GA4 PII data is important?
Redacting PII in GA4 is essential for maintaining user privacy and adhering to GDPR guidelines.
Leaking users’ personal information in GA4 reports is against Google Analytics’ terms of service, risking the shutdown of your GA4 property.
How to implement data redaction in GA4?
Follow the steps below:
- Identify Redactable Data
- Navigate to Data Redaction Settings
- Redact Email Data
- Identify and Redact URL Query Parameters
- Test Your Redaction Settings
- Wait and Verify
Step 1: Identify Redactable Data
Login to your GA4 property and look for reports that display URL query parameters. Typically, these can be found in reports related to pageviews or events where URLs are listed.
A common report to check is the “Pages and screens” and ‘Landing Page‘ reports, which show the URLs users visit.
If the standard reports don’t provide the necessary detail, utilize the “Explore” feature in GA4.
Create a new exploration and add “Page path + query string” or similar dimensions to see any query parameters in URLs.
GA4 allows the redaction of two types of data: Email address and URL query parameters.
Email Addresses: GA4 is designed to recognize and redact email addresses. This is done by detecting common patterns in text that are typically used in email formats.
URL Query Parameters: Apart from email addresses, GA4 also allows the redaction of information found in URL query parameters. You can choose up to 30 specific query parameters you wish to redact.
URL query parameters are parts of a URL that send additional information to the server. They typically appear at the end of a URL, following a question mark (?), with each parameter separated by an ampersand (&).
These parameters often consist of a key-value pair, where the key is a unique identifier, and the value is the information associated with that key.
Following is an example of a URL that contains query parameters:
https://www.example.com/products?category=electronics&item=laptop&[email protected]
In this URL, there are three query parameters:
- category=electronics
- item=laptop
- [email protected]
Following is the example of the same URL after data redaction:
https://www.example.com/products?category=electronics&item=laptop&userEmail=redacted
Here, the value of the ‘userEmail’ parameter has been replaced with the word “redacted”. This conceals the email address while keeping the rest of the URL intact.
Step 2: Navigate to Data Redaction Settings
Step-2.1: Navigate to the Admin section of your GA4 account.
Step-2.2: Click on “Data streams” and then click on your web data stream.
Step-2.3: Scroll down and click on the “Redact data” option:
Step 3: Redact Email Data
Note: For properties created before October 2023, you must manually enable email data redaction. Newer properties have this feature enabled by default.
Step 4: Identify and Redact URL Query Parameters
Step-4.1: Identify the query parameters which contain PII data. Let’s say ‘first_name’ and ‘postcode’ are such parameters.
Step-4.2: After identifying the query parameters, return to the “Redact data” section, switch on the toggle button next to ‘URL query parameters’:
Step-4.3: Enter the parameters you wish to redact:
Step 5: Test Your Redaction Settings
GA4 offers a feature to test how your selected query parameters will be redacted.
Step-5.1: Click on the ‘Test data redaction’ drop-down menu:
Step-5.2: Enter a URL containing the query parameters you want to exclude.
Step-5.3: Click on the ‘Preview redacted data‘ button:
You should now see the redacted version on the right-hand side:
Step-5.4: Scroll up and click on the ‘Save’ button on the top right-hand side:
Step 6: Wait and Verify
After implementing these settings, the redacted data might take up to 24 hours to appear. Verify the changes in the report you created.
Limitations of GA4 data redaction
Understanding these limitations is crucial for effectively using data redaction in GA4 and ensuring that the approach to data privacy and compliance is comprehensive and multifaceted.
#1 Limited to Web Data Streams
Data redaction is only available for web data streams in Google Analytics 4. It cannot be applied to data collected from mobile app data.
#2 Best-Effort Basis for Email Addresses
Identifying and redacting email addresses is done on a best-effort basis.
While GA4 aims to identify email addresses effectively, there may be instances where it is not completely accurate.
#3 Client-Side Occurrence
Data redaction occurs on the client side. This means it happens after GA4 modifies or creates events (also client-side) and before the data is sent to GA4 servers.
Data redaction does not occur server-side or after data has been stored.
#4 Handling of Percent-Encoded URL Query Parameters
GA4 accepts percent-encoded URL query parameters, including Unicode characters that browsers recognise.
Let’s say we have a URL parameter like 学生 (meaning “student” in Chinese).
In the URL, it appears as a percent-encoded string.
Test URL: http://www.example.com/?%E5%AD%A6%E7%94%9F=alice
When this URL is redacted in GA4, the value associated with the parameter 学生 will be masked.
Redacted Version: http://www.example.com/?%E5%AD%A6%E7%94%9F=(redacted)
#5 Potential Over-Redaction
There is a possibility that GA4 might incorrectly interpret the text as an email address and redact it.
It can occur if the text includes an “@” symbol followed by a top-level domain (e.g., example.com), even if it’s not an actual email address.
For example, consider the following URL:
https://www.example.com/profile?username=social@network
In this URL, ‘username=social@network’ is the query parameter, where “social@network” is meant to be a social media handle or a unique identifier.
However, because this text includes an “@” symbol followed by what appears to be a top-level domain (“network” in this case, akin to “com” in “example.com”), GA4’s data redaction feature might misinterpret “social@network” as an email address.
As a result, it could redact this information, altering the URL in the GA4 reports to something like:
https://www.example.com/profile?username=(redacted)
This example demonstrates the potential for GA4’s data redaction feature to mistakenly redact non-email text that coincidentally follows an email-like format.
#6. Does Not Evaluate HTTP-Header Values
HTTP headers are part of the HTTP request and response messages. They carry important information about the client browser, the requested page, the server, and more.
One such header is the ‘referer’ header, which indicates the page URL directing the user to the current page.
In some scenarios, particularly with older browsers, the ‘referer’ header can include the full URL of the previous page, complete with its query parameters. These query parameters could potentially contain PII data.
While GA4 can redact query parameters in URLs within the data it collects directly, it cannot assess and redact information contained within HTTP headers.
So if sensitive information, like PII, is passed through query parameters in the ‘referer’ header, GA4’s data redaction feature won’t be able to detect or redact this information.
It’s worth noting that modern browsers have improved the handling of the ‘referer’ header, often truncating the URL to just the domain or path, without query parameters, to enhance privacy.
However, reliance on browser behaviour should not substitute for proper data handling practices.
#7 Inability to Block PII via Other Methods
When using the Measurement Protocol or Data Import, there is a possibility that the data being sent to GA4 includes PII.
The data redaction feature in GA4 is designed to work with data collected through standard web tracking mechanisms.
However, this feature does not extend to data sent via the Measurement Protocol or imported via Data Import.
So, if PII is included in the datasets sent through these methods, GA4’s data redaction feature will not automatically redact it.
Conclusion
While data redaction in GA4 is a powerful tool, blocking sensitive information from the source is best.
This feature should be treated as a secondary layer of protection.
Ensure sensitive information is not included in URLs, and remember that this feature does not apply to data sent via Measurement Protocol or Data Import.
Regularly test and update your redaction settings to maintain compliance with privacy regulations.
Introduction to PII
PII stands for Personally Identifiable Information. It is a subset of personal data that can be used to uniquely identify an individual.
Any information that reveals an individual’s identity, contact or location is a PII.
Examples of PII
- email addresses
- phone numbers
- social security numbers
- names of the users
- precise location of users, etc.
Rules regarding sending PII data
It is against Google Analytics Terms of Service to send PII data to the Google Analytics server.
If you are found to collect PII in GA, then you may end up losing your GA account for good.
Google has implemented strict data privacy measures in GA4 to protect user data. It automatically masks any PII data if it detects it.
However, sometimes, PII data can still slip into your GA4 reports.
Finding PII data in GA4 reports
At least once a week, go through all of your reports in your GA4 property to find any accidental leak of PII data.
PII data can often appear as an event name, event parameter, page URL or page title.
PII data can also be passed through user id, measurement protocol or data import.
For example, you can search for ‘@’ in the ‘Pages and Screens’ report to check if any email addresses have been captured.
The results will show up If there are any email addresses captured in the report.
Similarly, you can also run the same validation in the ‘Events’ reports as well.
Below are some of the regex formats that you can use to identify the PII data in GA4 reports:
To identify full email ID formats:
([a-zA-Z0-9_\.-]+)@([\da-zA-Z\.-]+)\.([a-zA-Z\.]{2,6})
To identify social security numbers:
(\d{3}-?\d{2}-?\d{4})
To identify addresses:
(drive|street|road|dr.|po box|rd.)
To identify phone numbers:
(\d{3}-?\d{3}-?\d{4})
To identify names:
(fn|ln|lastname|firstname|name|fullname)
Methods to remove PII from Google Analytics 4 (GA4)
Use the following methods to remove PII data from your GA4 property:
1. Use the GA4 API to delete PII data from specific reports or data streams programmatically.
2. Use the user deletion API to delete user data by specifying the client ID, user ID or app instance ID.
3. Use the data deletion requests feature in GA4 to delete PII data based on specific criteria, such as event name and event parameter.
4. Use the data retention settings in GA4 to delete PII data after a certain period automatically.
5. Use the GA4 Data Redaction feature to prevent the inadvertent collection and storage of PII data within GA4 data streams.
6. Use Google Tag Manager to prevent PII from being sent to GA4 in the first place.
What is GA4 data redaction?
Data redaction is a feature of Google Analytics 4 (GA4) to remove Personal Identifiable Information (PII) (like name, email address, etc) from the data collected by website events before they are sent to Google Analytics Servers. Redacted values will appear as “(redacted)” in your GA4 reports.
Note: The data redaction feature only applies to web data streams.
Why redacting GA4 PII data is important?
Redacting PII in GA4 is essential for maintaining user privacy and adhering to GDPR guidelines.
Leaking users’ personal information in GA4 reports is against Google Analytics’ terms of service, risking the shutdown of your GA4 property.
How to implement data redaction in GA4?
Follow the steps below:
- Identify Redactable Data
- Navigate to Data Redaction Settings
- Redact Email Data
- Identify and Redact URL Query Parameters
- Test Your Redaction Settings
- Wait and Verify
Step 1: Identify Redactable Data
Login to your GA4 property and look for reports that display URL query parameters. Typically, these can be found in reports related to pageviews or events where URLs are listed.
A common report to check is the “Pages and screens” and ‘Landing Page‘ reports, which show the URLs users visit.
If the standard reports don’t provide the necessary detail, utilize the “Explore” feature in GA4.
Create a new exploration and add “Page path + query string” or similar dimensions to see any query parameters in URLs.
GA4 allows the redaction of two types of data: Email address and URL query parameters.
Email Addresses: GA4 is designed to recognize and redact email addresses. This is done by detecting common patterns in text that are typically used in email formats.
URL Query Parameters: Apart from email addresses, GA4 also allows the redaction of information found in URL query parameters. You can choose up to 30 specific query parameters you wish to redact.
URL query parameters are parts of a URL that send additional information to the server. They typically appear at the end of a URL, following a question mark (?), with each parameter separated by an ampersand (&).
These parameters often consist of a key-value pair, where the key is a unique identifier, and the value is the information associated with that key.
Following is an example of a URL that contains query parameters:
https://www.example.com/products?category=electronics&item=laptop&[email protected]
In this URL, there are three query parameters:
- category=electronics
- item=laptop
- [email protected]
Following is the example of the same URL after data redaction:
https://www.example.com/products?category=electronics&item=laptop&userEmail=redacted
Here, the value of the ‘userEmail’ parameter has been replaced with the word “redacted”. This conceals the email address while keeping the rest of the URL intact.
Step 2: Navigate to Data Redaction Settings
Step-2.1: Navigate to the Admin section of your GA4 account.
Step-2.2: Click on “Data streams” and then click on your web data stream.
Step-2.3: Scroll down and click on the “Redact data” option:
Step 3: Redact Email Data
Note: For properties created before October 2023, you must manually enable email data redaction. Newer properties have this feature enabled by default.
Step 4: Identify and Redact URL Query Parameters
Step-4.1: Identify the query parameters which contain PII data. Let’s say ‘first_name’ and ‘postcode’ are such parameters.
Step-4.2: After identifying the query parameters, return to the “Redact data” section, switch on the toggle button next to ‘URL query parameters’:
Step-4.3: Enter the parameters you wish to redact:
Step 5: Test Your Redaction Settings
GA4 offers a feature to test how your selected query parameters will be redacted.
Step-5.1: Click on the ‘Test data redaction’ drop-down menu:
Step-5.2: Enter a URL containing the query parameters you want to exclude.
Step-5.3: Click on the ‘Preview redacted data‘ button:
You should now see the redacted version on the right-hand side:
Step-5.4: Scroll up and click on the ‘Save’ button on the top right-hand side:
Step 6: Wait and Verify
After implementing these settings, the redacted data might take up to 24 hours to appear. Verify the changes in the report you created.
Limitations of GA4 data redaction
Understanding these limitations is crucial for effectively using data redaction in GA4 and ensuring that the approach to data privacy and compliance is comprehensive and multifaceted.
#1 Limited to Web Data Streams
Data redaction is only available for web data streams in Google Analytics 4. It cannot be applied to data collected from mobile app data.
#2 Best-Effort Basis for Email Addresses
Identifying and redacting email addresses is done on a best-effort basis.
While GA4 aims to identify email addresses effectively, there may be instances where it is not completely accurate.
#3 Client-Side Occurrence
Data redaction occurs on the client side. This means it happens after GA4 modifies or creates events (also client-side) and before the data is sent to GA4 servers.
Data redaction does not occur server-side or after data has been stored.
#4 Handling of Percent-Encoded URL Query Parameters
GA4 accepts percent-encoded URL query parameters, including Unicode characters that browsers recognise.
Let’s say we have a URL parameter like 学生 (meaning “student” in Chinese).
In the URL, it appears as a percent-encoded string.
Test URL: http://www.example.com/?%E5%AD%A6%E7%94%9F=alice
When this URL is redacted in GA4, the value associated with the parameter 学生 will be masked.
Redacted Version: http://www.example.com/?%E5%AD%A6%E7%94%9F=(redacted)
#5 Potential Over-Redaction
There is a possibility that GA4 might incorrectly interpret the text as an email address and redact it.
It can occur if the text includes an “@” symbol followed by a top-level domain (e.g., example.com), even if it’s not an actual email address.
For example, consider the following URL:
https://www.example.com/profile?username=social@network
In this URL, ‘username=social@network’ is the query parameter, where “social@network” is meant to be a social media handle or a unique identifier.
However, because this text includes an “@” symbol followed by what appears to be a top-level domain (“network” in this case, akin to “com” in “example.com”), GA4’s data redaction feature might misinterpret “social@network” as an email address.
As a result, it could redact this information, altering the URL in the GA4 reports to something like:
https://www.example.com/profile?username=(redacted)
This example demonstrates the potential for GA4’s data redaction feature to mistakenly redact non-email text that coincidentally follows an email-like format.
#6. Does Not Evaluate HTTP-Header Values
HTTP headers are part of the HTTP request and response messages. They carry important information about the client browser, the requested page, the server, and more.
One such header is the ‘referer’ header, which indicates the page URL directing the user to the current page.
In some scenarios, particularly with older browsers, the ‘referer’ header can include the full URL of the previous page, complete with its query parameters. These query parameters could potentially contain PII data.
While GA4 can redact query parameters in URLs within the data it collects directly, it cannot assess and redact information contained within HTTP headers.
So if sensitive information, like PII, is passed through query parameters in the ‘referer’ header, GA4’s data redaction feature won’t be able to detect or redact this information.
It’s worth noting that modern browsers have improved the handling of the ‘referer’ header, often truncating the URL to just the domain or path, without query parameters, to enhance privacy.
However, reliance on browser behaviour should not substitute for proper data handling practices.
#7 Inability to Block PII via Other Methods
When using the Measurement Protocol or Data Import, there is a possibility that the data being sent to GA4 includes PII.
The data redaction feature in GA4 is designed to work with data collected through standard web tracking mechanisms.
However, this feature does not extend to data sent via the Measurement Protocol or imported via Data Import.
So, if PII is included in the datasets sent through these methods, GA4’s data redaction feature will not automatically redact it.
Conclusion
While data redaction in GA4 is a powerful tool, blocking sensitive information from the source is best.
This feature should be treated as a secondary layer of protection.
Ensure sensitive information is not included in URLs, and remember that this feature does not apply to data sent via Measurement Protocol or Data Import.
Regularly test and update your redaction settings to maintain compliance with privacy regulations.
My best selling books on Digital Analytics and Conversion Optimization
Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.
Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.
Attribution Modelling in Google Analytics and BeyondSECOND EDITION OUT NOW!
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.
Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.