How to Remove Emails and Other PII From URLs in Google Tag Manager

1 year ago 98
ARTICLE AD BOX

This station was written by Stephen Harris and Sasha Helms.


Google Analytics (GA) is simply a large instrumentality that allows you to cod a batch of information and assistance extract invaluable insights from it. However, immoderate accusation that makes its mode into your reports tin beryllium little utile for your investigation oregon adjacent harmful to idiosyncratic privacy. One specified illustration is Personal Identifiable Information (PII).

💡 This nonfiction explains however to modify your existing setup successful Google Tag Manager (GTM) to mitigate PII successful some GA4 and Universal Analytics (UA).

Many of you volition beryllium speechmaking this for GA4, arsenic by present astir concern successful UA is replete. But until 2023 H1*: if determination is PII being sent to UA, this volition proceed to beryllium a captious hotfix or until you nary longer trust connected UA for accumulation reporting.

* Or astatine the latest Q3 for 360 properties, based connected Google’s UA deprecation timeline.

What is PII?

PII, oregon Personal Identifiable Information, is accusation astir the idiosyncratic that tin straight oregon indirectly (when tied with different data) place them arsenic an individual. It is against Google’s Terms of Service (TOS) to cod PII astir your visitors/users successful your Google Analytics account.

All information from each days wherever immoderate PII was tracked is astatine hazard of deletion.

The Office of Privacy and Open Government defines PII as:

“Information which tin beryllium utilized to separate oregon hint an individual’s identity, specified arsenic their name, societal information number, biometric records, etc. alone, oregon erstwhile combined with different idiosyncratic oregon identifying accusation which is linked oregon linkable to a circumstantial individual, specified arsenic day and spot of birth, mother’s maiden name, etc.”

PII Examples

  • Names
  • Emails
  • Physical addresses
  • Phone numbers
  • Internet protocol (IP) addresses
  • Personal recognition numbers
    • Passport number
    • Social information numbers
    • Driver’s license
  • Financial relationship information
  • Physical identifiers
    • Photographs
    • Finger prints
    • Voice signature
    • Medical records

New call-to-action

How Does PII Get into Google Analytics?

PII usually gets into GA accidentally done URL strings.

When a idiosyncratic submits a signifier connected your tract oregon utilizes a hunt feature, the submitted information mightiness beryllium sent to the server by appending the submitted information successful query parameters connected the signifier submission petition URL.

If that submission URL is not redirected to a confirmation leafage without the parameters, past by default everything the idiosyncratic typed into the signifier fields volition get tracked successful GA arsenic portion of the pageview URL. For example:

How Does PII Get into Google Analytics?

Checking Your Data for PII

We urge checking your UA information (in an unfiltered view) to pass which URL parameters volition request to beryllium screened.

💡 This PII study nexus generator for UA volition assistance you place imaginable offending parameters from your UA information to effect your blocklist for some GA4 and UA.

For your reference, the pursuing is simply a list* of regexes utilized by that template:

* This database is non-comprehensive / whitethorn nutrient mendacious positives — you should reappraisal matches to corroborate validity.

PII Category Regular Expression
Phone Numbers – USA [=,;]\s*(\+\s*\d{1,3})?[-,.+\s(]*\d{3}[-,.+\s)]*\d{3}[-,.+\s]*\d{4}($|[,;:/?&#])
Phone Numbers – International \?.*([=:\,!]|%2[1C])(([\s+.\,)(-]|%2[0B1C89])*\d){11\,15}($|[&#:\,!%])
Physical Address (\d+\ )(([^\s]*)|([^\s]*\ ([^\s]*)))\ ((st(reet)?|ave(nue)?|dr(ive)?|(high)?way|la?ne?|r(oa)?d|b(ou)?le?v(ar)?d))
Zip Code [=,]\d{5}(-\d{4})?($|[&#,])
Email Address [^&?#/](@|%40)([^&?#/]+)\.
CC – Visa/MC [=,;]\s*(\d{4}[-\s+]*){3}\d{4}($|[,;:/?&#])
CC – Amex [=,;]\s*\d{4}[-\s+]*\d{6}[-\s+]*\d{5}[-\s+]*($|[,;:/?&#])
Common parameters – password [?&,;](pwd?|password)=[^&#]
Common parameters – name [?&,;](f|l|u|s|full|first|last|user|screen)?name=[^&#]
Common Names – 1 \b(J(im(my)?|ohn|ames)|Robert|Bob(by)?|Michael|(B|W)illy?(iam)?|Dav(id|e)|(D|R)ic(k|hard)|Ch(arl(es|ie)|uck))\b
Common Names – 2 \b(Mary|Pat(ty|ricia)|Linda|Barb(ara)?|E?liz(zy|abeth)|Jenn?(ifer)?|Maria|Su(e|san))\b
Social Security # (SSN) [=,;]\s*\d{3}[-\s+]*\d{2}[-\s+]*\d{4}($|[,;:/?&#])
IP Address – IPv4 [^vn][=,;]\s*((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)($|[-,;:/?&#])
IP Address – IPv6 [=,;]\s*(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))($|[-,;/?&#])

Fixing PII astatine the Source

If you spot PII successful your Google Analytics reports, you should speech to the IT oregon web improvement squad who is liable for attraction and functionality of the website oregon app that is submitting the PII.

The champion mode to forestall this is to encrypt the petition values that are coming done with PII oregon region them from URLs entirely. This is simply a preferable solution arsenic it besides safeguards the information and protects idiosyncratic privacy the most.

💡 In galore cases, 1 of the leanest and astir effectual solutions – if imaginable – is to alteration the form method from GET to POST.

This volition nonstop submitted information arsenic a payload successful the assemblage of the request, alternatively of successful the petition URL. This is simply a wide privacy/security champion signifier for galore reasons beyond conscionable web analytics integrity, arsenic determination are aggregate ways URLs tin beryllium “leaked” successful transport, connected servers, and connected the web.

But chances are, you request a hole arsenic rapidly arsenic possible. So unless you’re fortunate capable to person the resources to instrumentality precise quickly, we counsel submitting that petition and simultaneously deploying a tracking update to surface PII from information collection. And if encrypting oregon removing the delicate information astatine the root is not possible, the pursuing attack volition beryllium a suitable imperishable solution for GA (and different PII-allergic platforms tracked via GTM).

For property/processing configuration specified arsenic filtering, GA4 looks rather antithetic from Universal Analytics – but nary of that is suitable for screening PII anyway.

Even if it’s filtered retired successful your reports, the infinitesimal PII has reached Google’s servers it has violated GA’s TOS. This is besides apt to interruption the offending website’s privateness policy. So PII screening indispensable beryllium done tracking-side, and we’re going to amusement you however to usage Google Tag Manager to forestall this information from passing to Google Analytics.

How to Remove PII from GA Using GTM

While GA4 and UA tags look different, the solution we supply volition enactment for some (and immoderate different level tracked via GTM).

Steps to Remove PII

  1. Choose a parameter screening method: Allow vs. Block
  2. Sanitize URLs with JavaScript
  3. Send Clean Data to GA
  4. Test & Publish

The halfway logic is successful a customized JavaScript adaptable that we tin usage to portion parameters from immoderate fixed URL. This JavaScript adaptable volition go a portion of your Google Analytics configuration tags for some Universal Analytics and GA4.

Allowing vs. Blocking Parameters

First, you’ll request to determine whether you’ll usage an allowlist oregon a blocklist method.

Like many, we’ve retired our usage of the presumption “whitelist” and “blacklist” – we’re done associating achromatic with good and achromatic with bad. In their spot we’ve adopted much inclusive terms: “allowlist” and “blocklist” (matching GTM’s convention).

The method you take volition govern however you handle query parameters by default. With a blocklist, you would let each query parameters to travel done but those connected the list. The allowlist would enactment the other way, removing each the parameters by default but those you would request to support and enactment connected the list.

Allowlist Method

Seriously see stripping each URL parameters by default. This negates a satellite of imaginable sound and focuses you connected determining which parameters supply information invaluable to your measurement strategy.

With this approach, we marque usage of an “allowlist” to designate circumstantial parameters arsenic the lone parameters we privation to beryllium picked up by GA. In this case, we conscionable privation to marque definite we’re inactive allowing cardinal level parameters specified arsenic utms, gclid, and gbraid.

💡 To assistance you place each your parameters successful your UA data, you tin usage this template from Google.

Implementing an allowlist is not thing you should bash hastily, arsenic this volition negate each non-specified parameters. So earlier publishing, consult with each teams that trust connected web analytics, and guarantee immoderate parameters important to them are included successful the allowlist.

Additionally, they should beryllium alert of the people date, due to the fact that it is imaginable that page-level metrics whitethorn summation connected pages that had antecedently been splintered crossed galore study rows.

Blocklist Method

If clip restraints don’t let for the coordination required to follow an allowlist, oregon you are acrophobic astir not having a backup for invaluable parameters you mightiness miss utilizing an allowlist, past you tin usage a blocklist to region lone parameters you specify.

💡 Refer to Checking Your Data for PII for tips connected utilizing your analytics information to place PII parameters.

Sanitizing URLs with Custom JavaScript

In Google Tag Manager, make a caller user-defined variable. Name it ‘Function – Strip Parameters’ and acceptable the benignant to beryllium Custom Javascript.

The codification you volition usage volition correspond to the method you chose. Copy the codification beneath for the mentation you chose to spell with, and spot it successful a customized JavaScript GTM adaptable named `Function – Strip Parameters`.

Allowlist Code

function(){

  var allowlist = ‘FIVE,EXAMPLE,ALLOWED,CUSTOM,PARAMS’+ ‘,utm_campaign,utm_content,utm_medium,utm_source,utm_term,
utm_creative_format,utm_marketing_tactic,gbraid,wbraid,gclid,dclid’
.split(‘,’),
      replaceWith = ; // If empty, blocked parameters volition beryllium dropped entirely,
                        //   different overridden with this value.

  return function sanitizeUrl( url ){
    return url.replace( /((\?)|&)([^#&=]+)(?:=([^#&]*))?/g, function(input,delim,qmark,key,val){
      if( -1 !== allowlist.indexOf(key) )
        return input;
      else return replaceWith ? delim+key+‘=’+replaceWith : qmark||;
    }).replace(/\?&*$|(\?)&+/,‘$1’);
  }
}

Blocklist Code

function(){

  var blocklist = ‘FIVE,EXAMPLE,BLOCKED,CUSTOM,PARAMS’.split(‘,’),
      replaceWith = ; // If empty, blocked parameters volition beryllium dropped entirely,
                        //   different overridden with this value.

  return function sanitizeUrl( url ){
    return url.replace( /((\?)|&)([^#&=]+)(?:=([^#&]*))?/g, function(input,delim,qmark,key,val){
      if( -1 === blocklist.indexOf(key) )
        return input;
      else return replaceWith ? delim+key+‘=’+replaceWith : qmark||;
    }).replace(/\?&*$|(\?)&+/,‘$1’);
  }
}

This relation is utilized to acceptable the parameters that you would similar to support (allowlist) oregon region (blocklist) and runs the process for accessing the URL petition information and replacing the specified parameters wrong it.

Replace the values successful the allowlist oregon blocklist variables wrong the codification with your ain parameter keys, utilizing commas arsenic delimiters, and making definite determination are nary spaces astir the commas. For example, if you are utilizing the blocklist method and privation to artifact the email, phone, and code parameters, your codification tin look similar this:

Sanitizing URLs with Custom JavaScript

Remember that the parameter keys indispensable beryllium written precisely arsenic they look successful the URL. To marque it a case-insensitive match, adhd an `i` to the extremity of the archetypal regular look truthful it looks similar this:

/((\?)|&)([^#&=]+)(?:=([^#&]*))?/gi

If you privation to support the parameters and lone redact their values, you tin acceptable replaceWith to the replacement value. For example, this configuration: replaceWith = ‘[REDACTED]’;’; would effect successful a sanitized URL that looks similar this:

https://www.seerinteractive.com/thank-you?submit=true&email=[REDACTED]

Sending the Sanitized Data to GA

Now we’ll make a mates of different variables that volition utilize our relation to nutrient a sanitized URL drawstring and update GA tracking configuration to usage sanitized values.

Create a Custom Javascript Variable called “Page URL (Sanitized)” and paste the pursuing codification into it:

function(){ instrumentality {{Function – Strip Parameters}}(location.href); }

In this codification snippet, we are calling the relation we defined archetypal and passing successful location.href arsenic an statement to get a sanitized mentation of the afloat leafage URL. We volition usage this adaptable successful some Universal Analytics and GA4 configuration settings.

If you person customized tracking pulling successful different URLs that request to beryllium screened arsenic well, you tin usage the aforesaid logic. For example, to sanitize Click URL:

function(){ instrumentality {{Function – Strip Parameters}}( {{Click URL}} ); }

Now, it’s clip to enactment these caller variables to usage successful your tracking configuration.

For GA4

Go to your GA4 configuration tag. In the “Fields to Set”, adhd a caller row.

Set the ‘Field Name’ to ‘page_location’, and the worth to your {{Page URL (Sanitized)}} variable. (If this tract is already defined, you’ll alternatively privation to harvester the caller parameter stripping logic with your existing customizations.) Sending the Sanitized Data to GA - for GA4

If leafage URLs are passed anyplace other (i.e. successful customized parameters), you should travel the aforesaid signifier to sanitize those fields arsenic well.

For Universal Analytics

Create different Custom Javascript Variable called “UA Pageview URL (Sanitized)”, this adaptable volition beryllium utilized with Universal Analytics configuration only:

function(){ instrumentality {{Function – Strip Parameters}}(location.pathname + location.search); }

If you person acceptable up your Universal Analytics tag correctly, you should person a GA Settings adaptable that contains your Universal Analytics configuration settings. Open this adaptable settings, and adhd 2 “Fields to Set”:

  1. page = {{UA Pageview URL (Sanitized)}}
  2. location = {{Page URL (Sanitized)}}:

Sending the Sanitized Data to GA - for UA

Test & Publish

Test your solution successful GTM Debug Mode.

Test on-page with the URL parameters that you included successful your allowlist/blocklist codification and instrumentality a look astatine however your caller variables are populating. You should spot your unaltered mentation of the URL arsenic the worth successful the default Page URL variable, and the sanitized mentation arsenic the worth of your customized variable:

Test & Publish

You should besides cheque your tags and marque definite they fired with the close data: Test & Publish

In owed diligence, earlier publishing you whitethorn besides privation to corroborate that immoderate configurations based connected URL parameters (i.e. extremity configurations) volition not interruption if immoderate parameters disappear.

For example, a signifier mightiness ever person appended a fewer parameters:

‘/[email protected]&success=1’,

If the extremity configuration looked for “&success=1” and we’re blocking ’email’, this extremity configuration volition break. This is due to the fact that erstwhile the email is removed from the URL, the URL volition incorporate “?success=1”.

Never trust connected parameter bid erstwhile matching URLs!

Instead ever usage this regex: [?&] close earlier the parameter cardinal to lucifer that parameter astatine immoderate position. No request to flight the ‘?’ quality here, since it is wrong the quality people brackets [ ], it gets treated arsenic a literal question mark. For example: `[?&]success=1(&|#|$)`.

If everything is moving arsenic expected, you’re escaped to people your changes. Verify your changes utilizing GA’s Realtime report(s).

Finally, whether you spell with allowlist oregon blocklist, it’s omniscient to support documentation connected however you’re manipulating query parameters.

UA Alternative: Sanitize All Fields via Custom Task

Depending connected your setup, PII whitethorn beryllium passed successful different fields too the tracked URL.

The supra attack tin beryllium applied to immoderate field, but it requires that each specified tract beryllium identified and explicitly acceptable with its ain dedicated GTM variable. While this is the astir accessible attack for GTM users since it leverages lone GTM features, determination is simply a much flexible enactment for UA that tin portion PII from each fields utilizing 1 artifact of code, and doesn’t necessitate identifying each offending fields upfront.

This method relies connected utilizing the customTask API that is disposable successful Universal Analytics.

In elemental terms, customTask is 1 of the functions that tally successful betwixt the tracking process erstwhile the information is collected and the HTTP petition that sends the information to the analytics server. It’s designed to springiness you a mode to entree and modify the petition information and besides allows you to modify different modular tasks/processes that hap aft it.

💡 While customTask tin beryllium utilized successful a lot of originative ways, we volition usage it to portion the delicate information from the analytics information postulation request. Brian Clifton has a large implementation usher for this that tin beryllium recovered here.

Unfortunately, determination is nary equivalent to Custom Task for GA4.

Get Started

Step 1: Download Template

To get started, download this GTM instrumentality containing some blocklist and allowlist logic, and the sanitized URL variables for GA4.

DOWNLOAD TEMPLATE

Step 2: Import into Your GTM

Import this into your GTM container, and plug into the applicable template tags:

  • GA4 – Under ‘Fields To Set’ successful the GA4 config tag, representation the sanitized leafage URL to `page_location` (as shown above).
  • UA – Under ‘Fields To Set’ successful the GA Settings variable(s), representation the sanitized leafage URL to `location` and representation a sanitized UA Pageview URL adaptable to the `page` tract (as shown above).
  • Other – Determine which fields way URLs and their formatting requirements, if needed make the sanitized URL adaptable for the required format(s), and override the offending fields successful the platform’s tracking tag/configuration with the sanitized URL variable(s),

Step 3: QA and Publish

Then modify the ‘Function – Strip Parameters’ adaptable to portion the offending parameters, QA, and publish!

And remember, to marque definite you’re not losing immoderate invaluable parameters earlier you opt for an allowlist; if you’re unsure it’s safer to default to a blocklist.

What Now?

PII Auditing Support

Seer tin help! We bring our analytics expertise to the array by enabling you and your squad to enactment upon your data. Regardless of if you’re looking for a coagulated instauration oregon a semipermanent partnership, we person a squad to acceptable your needs.

VIEW ANALYTICS SERVICES

Additional Resources

💡 For much tips and tricks connected utilizing Google Tag Manager, support reading:

 read much gtm posts


Sign up for our newsletter for much posts similar this successful your inbox: