February 10, 2023
Thresholding applied in Google Analytics 4? Do this
Here’s something that you might often face in Google Analytics 4. You open a report and see an orange exclamation mark at the top of the report. You click it and see this warning saying “Thresholding applied” (even though it also says that the report is unsampled):
How can that be? The report is unsampled, but that “threshold” sounds like sampling, where you get only a portion of the data you captured.
In this blog post, I will explain what thresholding is, what happens when it’s applied, and how to avoid it.
If you are in a hurry and just need the solution
In this article, I explain thresholding, why it happens, how to avoid it, and a workaround. But if you are in a hurry and just want the solution answer, you can jump straight to this chapter.
Also, you can watch this Youtube short from my channel.
Table of contents
– Hide table of contents –
- What is causing this?
- What is the impact of Thresholding in Google Analytics 4?
- Why is Google doing this?
- How to avoid Thresholding in GA4?
- What to do if you see a Thresholding warning?
- It’s not the end of the world (or is it?)
- Final words
What is causing this?
Thresholds in Google Analytics 4 are caused by a feature called Google Signals. It is disabled by default, but if you turn it on, things might get weird.
Why would you want to enable Google Signals in the first place? There are at least two reasons. But first, let’s quickly learn what Google Signals is in general.
Google Signals enables the tracking of users across devices and platforms. When enabled, Google Signals collects data from users who have signed in to a Google account and have enabled the feature in their Google Account settings. This data is then used to provide insights into your audience’s demographics, interests, and other characteristics. You can learn more about it here.
If Google Signals is active, your GA4 property will collect more data and unlock certain features. That’s where we come to at least two reasons why people might want to enable Google Signals:
- It will start populating demographic data in GA4
- It lets you reuse Google Analytics audiences as retargeting audiences in Google Ads (thus, you can show more targetted ads to them)
But together with that, we get one caveat, thresholding.
What is the impact of Thresholding in Google Analytics 4?
If you are looking at the report and the property contains data from Google Signals, Google Analytics will hide rows in the reports with small user numbers). I don’t know the exact number, but it looks like something below 50 users/events per row.
So if you are looking at a Traffic Acquisition report and some traffic sources generated less than 50 users in that timeframe, GA4 interface will hide that data. It is still stored in the database, but it’s not displayed.
Here’s an example. I know that there are hundreds of unique traffic sources driving visitors to a website (I checked Universal Analytics). But if data thresholding kicks in, you will see only those that drove more than 50 (or so) users.
Why is Google doing this?
Officially, they say this is to prevent us (GA users) from identifying individual users based on the data that Google Signals adds to our reports (e.g., age, gender, etc.).
Honestly, I have no idea how I could identify a user based on that (because, for example, Google Signals data is not exported to Bigquery), but that’s Google’s position. And there isn’t much we, as GA users, can do here. Thresholds are system defined, and we cannot adjust them.
How to avoid Thresholding in GA4?
First, let’s look at what you can do beforehand to avoid data threshold in Google Analytics 4. The answer is fairly simple: do not enable Google Signals.
If you don’t plan to use Demographic reports and you don’t plan to use GA4 audiences for remarketing in Google Ads, you should not enable Signals.
If, on the other hand, you really need one of those two features, I am afraid there is no natural way to completely prevent this issue (there’s one workaround, but it has its caveats. I will explain it in the next chapter).
What if you have enabled Google Signals in the past and you disable it now? Will it help?
It will help with future data. If the time range you analyze no longer includes Signals data, thresholding should not be applied. But if you analyze more extended periods that have older data with Signals, thresholding will kick in again.
What to do if you see a Thresholding warning?
If you are on this page, this means that you already enabled Google Signals in the past, and you are facing this nasty issue. Now what?
One workaround will help you turn off thresholding – changing the default reporting identity. But there’s a caveat too. First, let me explain where to change this, and then I will explain the implications.
Default reporting identity is a feature that affects how Google Analytics calculates users of your website/app. Should it use only cookie data? Should it also use User ID data (that you may be already sending to GA)? Should Google signals data be included too?
You can change it by going to Admin > Reporting Identity.
Here you will see two options (but actually, there are three). Click Show all.
- Device-based reporting identity is the most basic. It will use just Device ID (a.k.a. first-party cookie). If the same user uses multiple browsers/devices, GA will treat that as separate users.
- Observed is a bit more advanced. It uses cookie data, Google Signals data (if you enabled it), and user ID (if you are tracking that too). Things such as user ID or Google Signals data can help GA to deduplicate certain users and understand that a person using several devices might still be the same person.
- Blended is the most advanced. It includes all the previous identity methods, plus it uses machine learning to fill in the gaps and model data. You need to implement Google consent mode to unlock this feature.
If you use Observed or Blended reporting identity (and you have collected data from Google Signals), thresholding will probably be applied.
BUT if you switch to Device-based, then Google Signals will not be used to calculate users, and thresholding will go away.
The good thing about reporting identity is that you can switch/change this as many times as you want and whenever you want. The data stored in GA’s database will not be affected. And reporting identity is applied retroactively too.
So in most cases, you can continue using Observed identity, and if you are curious about rows with small numbers, you can quickly switch to the device-based identity back and forth.
Just remember that when you use device-based, things like User ID are not taken into the calculation of your reports, thus user counts will be less accurate. So that’s the main caveat.
Don’t worry. Reporting identity does not affect the data collection. So if you switched to device-based (while your GA4 is collecting user IDs), all data would be collected. But it won’t be used in user calculations until you switch back to observed or blended identity.
Sometimes a bug happens
Occasionally, I noticed that sometimes the thresholding warning remains even if I change the reporting identity to device-only. In those cases, doing a hard refresh (CTRL + F5 on Windows) helps sometimes. If not, I ignore the warning because the reports start showing the rows with small numbers too.
Maybe when you are reading this, the issue is already fixed. But keep this in mind.
It’s not the end of the world (or is it?)
Sometimes yes, sometimes no.
Based on what I have seen, rows with small numbers (at least in the traffic acquisition report) usually account for less than 5% of all traffic. So that’s not a big deal to data accuracy because GA4 then tries to fill in some gaps with modeled data or user-id/Google Signals.
But there might also be situations where the impact is much larger. For example, small websites (that get just hundreds of visitors per day/week) might face a more significant challenge. Imagine that you cannot see half of your events in reports because there just aren’t many. Then you will be forced to stick with a device-based reporting identity.
So I would suggest regularly switching between reporting identity settings to double-check the impact. I wish there was a quick way to change the reporting identity directly in the reports/main interface (rather than going to the Admin section). Also, another wish would be to have a separate reporting identity that lets us continue using the user id but not Google Signals.
Thresholding applied in Google Analytics 4: Final words
This is one of those articles where I wish such a thing would not exist in GA4.
Data thresholding in Google Analytics 4 is not sampling. Those are different things. Thresholding is applied when your GA4 property meets all of these conditions:
- You have collected some data through Google Signals (by enabling them at some point)
- Your reporting identity is either Blended or Observed
- AND a report (that you’re looking at) contains rows with small user/event/session numbers (I don’t know the exact number, but I would say it should be 50 or below)
In that case, rows with small numbers will be hidden and not displayed in the report (even though that data is still available somewhere in the background).
To avoid data thresholding in the future, don’t enable Google Signals (if you don’t plan to use remarketing features or demographic reports in GA). If you have already done it, you can change the reporting identity to device-based whenever you want, and you are free to switch between them. This setting does not impact the data you have collected, it affects the way numbers are calculated.
Does thresholding applies to data from API as well?
Didn't try it yet
Yes it does apply, but not in BigQuery link.
Analytics Mania, you are a hero.
Will any of this matter if Google goes all in on Bard and changes how source links are displayed to (and valued by) users?
Your guess is a as good as mind
as Jakub said, I assume that the data from GA4 that we display on Looker Studio will not display either because of the threshold.
Hi Julius, very interesting blogpost, thank you!
So if I understand correctly: if you are not using user ID, you can use Google Signals ánd avoid thresholding at the same time (using device-based reporting)?
No. User ID is not causing thresholding problems. Google signals does. So not using user ID but having Google signals will not avoid thresholding.