Google explains sampling in the following way
In data analysis, sampling is the practice of analyzing a subset of all data in order to uncover the meaningful information in the larger data set. For example, if you wanted to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform, you could count the number of trees in 1 acre and multiply by 100, or count the trees in a half acre and multiply by 200 to get an accurate representation of the entire 100 acres.
In terms of thresholds for sampling as of March 2020, Google states the following
Google Analytics Standard: 500k sessions at the property level for the date range you are using
In some circumstances, you may see fewer than 500k sessions sampled. This can result from the complexity of your Analytics implementation, the use of view filters, query complexity for segmentation, or some combination of those factors. Although we make a best effort to sample up to 500k sessions, it’s normal to sometimes see slightly fewer than 500k sessions returned for an ad-hoc query.
Google Analytics 360: 100M sessions at the view level for the date range you are using
360 thresholds vary according to how queries are configured. For detailed information, contact your 360 support team.
How to identify it and why is it a problem?
When looking at reports in Google Analytics you will usually see a green tick mark in the report similar. This means no sampling is applied.
In case the tick mark appears in yellow, it means that you are looking at a sampled report.
Now why exactly is this a problem?
If you are looking at a sample rate of 50%> it may help you to analyze demographics of your audience or similar high-level insights, but definitely will not help you if you want to do any kind of comparative analysis. If you make decisions based on sampled data you basically work with inaccurate data. These decisions can lead to:
- A loss in trust of the data and a risk of the reputation in the data/marketing team
- A financial loss for the company as you make budgeting decisions based on incomplete data
So let’s explore the options you have to avoid sampled data.
Sampling: How to avoid it when working with data
Option 1: Work with standard Google Analytics reports
Google does not sample the standard reports in Google Analytics. This means you are safe from sampling when you look at any standard report. If you look at top level metrics this will be the way to go. Chances are however, that this will not suffice for you. Especially when you made it as far that you have a sampled view, I doubt that looking at these top level reports will bring you one step further ;-).
Option 2: Use short date ranges
Another way to avoid sampling is to use a short date range. If you reduce monthly to weekly or even daily, the sampling will at one point disappear. This approach might work to look at very short date ranges but makes analysis of longer date ranges hard as you would need to export the reports into Google Sheet documents or CSV files and then somehow patch it together (which is a time waster you should probably avoid).
Option 3: Buy Google Analytics 360
As you can see in the opening paragraph of this article, the sampling threshold for Google Analytics 360 (aka Google Analytics Premium) is much higher (500k sessions on Google Analytics vs. 100m sessions on Google Analytics 360). The issue we see here is that it comes with a hefty price tag starting around $150k per year. Of course it not only comes with reduced sampling but also with other features but sampling is clearly the most important feature.
Option 4: Use Windsor.ai
Another option is to use Windsor.ai to extract all your data upsampled. For those looking at media KPI’s it also connects the upsampled data to the costs from your various sources (Google Ads, Facebook, Bing, LinkedIn, DCM, …) and makes it available for you to work with in raw format via API, Google Data Studio, Microsoft PowerBI or our own dashboard.
The steps to get started for free are:
- Connect your Google Analytics and your costs data here
- Load data for a date range of 20 – 30 days to get your upsampled insights insights
- Setup your dashboard in the platform of your choice (links above) and analyse data
- (Optional) Customize the setup to connect your Google Analytics data with your CRM or e-Commerce data or enable pulling and visualization of custom dimensions from your Google Analytics setup
It depends greatly on your technical abilities and your wallet to what option you choose. The most important takeaway which I’m sure you understood by now is that making decisions based on sampled data leads to many problems.
If you have another way of tackling this problem feel free to share it with us.