Is BigQuery part of your data stack? If yes, this tutorial will guide you on how to set up your automated data pipelines. Google has great support for it’s own products but in 99% of all cases, you have other platforms on your marketing stack. In this article/youtube tutorial we will guide you how to stream all your analytics, CRM and media data into BigQuery. The currently supported platforms for this tutorial are:
Google Ads | Facebook Ads | Bing Ads | Google Analytics MCF |
LinkedIn Ads | TikTok Ads | AdRoll Ads | HubSpot |
Criteo | RTB House | Twitter Ads | Salesforce |
AppNexus | Google Display & Video 360 | Google Campaign Manager | Stripe |
Taboola | Outbrain | Amazon Seller Central & Amazon MWS | Shopify |
Prerequisites before getting started
- Access to all the channels you would like to integrate
- Google Cloud account with BigQuery
There are a few steps involved to get your data pipelines set up
- Connecting your data
- Streaming of the data into BigQuery
- Option 1: Manual upload
- Option 2: Automated upload
The main focus of this article will be around setting up data pipelines to BigQuery.
Let’s get started.
Connecting your data
To begin with we have to connect our data to a ETL platform. In this example we’ll use our platform where I connected all the data of our channels.
- Google Analytics
- Hubspot CRM
- Google Ads
- Facebook Ads
- Linkedin Ads
Of course you can add channels as you like (see the table at the beginning of this article for more details).
To get started with connecting your data simply head to our setup page and sign up for a trial to see it in action.
The platform is self explanatory and requires only a few clicks. If you think it’s too complex you can check out which buttons to click here.
Getting your data into BigQuery
And for this, we have two options. Option one is the more boring option, which is a manual JSON upload, meaning we upload a file manually via export and import. And option two is the automated data upload. I will quickly show option one. But I guess if you want to automate your data pipelines, you would rather focus on option two. So let’s be quick with option one.
Option 1: Manual upload
Here the links I’m using in Option 1:
Option 2: Automated upload
Here the links I’m using in Option 2:
- Google BigQuery
- Google Cloud Compute Engine (The free instance type I mention is called f1-micro)
- Crontab
- Windsor.ai Dashboard
Here you can generate your own API URL easily:
Enter user name here Enter api key hereYour API URL is
To create advanced API URL’s please check out our API Documentation. For more information on the fields available on api.windsor.ai you can check the table definitions here.
Scheduling shell scripts in Google Cloud compute (taken from this post)
- At the Google Cloud Platform Console click on Products & Services which is the icon with the four bars at the top left hand corner.
- On the menu go to the Compute section and hover on Compute Engine and then click on VM Instances.
- Go to the menu bar above the instance section and there you will see a Create Instance button. Click it and fill in the configuration values that you want your new instance to have. The values that you select will determine your VM instance features. You can choose, among other values, the name, zone and machine type for your new instance.
- In the Machine type section click the drop-down menu tab to select an “f1-micro instance”.
- In the Identity and API access section, give access scope to the Storage API so that you can read and write to your bucket in case you need to do so; the default access scope only allows you to read. Also enable BigQuery API.
- Once you have the instance created and access to the bucket, just create your cron job inside your new instance: In the user account under which the cron job will execute, run crontab -e and edit this file to run the cron job that will execute your baby.sh script. The following documentation link should help you with this.
Bonus (I did not cover this in the video)
Do you want to fetch data from the connectors directly? You can try get additional fields which are not in the standard table definitions (such as custom dimensions, metrics and additional media metrics). You can start building your connector queries here.
Did you like this article or do you have any questions? We would love to have your feedback. Just contact us on the chat!