In our previous blog, we introduced and explained the concept of clean rooms; in particular, Google’s privacy-centric data warehouse solution – Ads Data Hub (ADH). ADH provides a clean room environment where advertisers can continue to use their data modeling capabilities within Google’s walled garden and also acquire insights to help optimize the campaign.
But, ADH can sometimes be tricky as a data warehouse solution. Its privacy-centric nature can make it challenging to get dynamic insights for a live campaign if the objective is to refresh the data after every two or three days. MiQ needed to come up with a solution to get granular insights on a live campaign using ADH data sources and also refresh the data for the insights after a particular frequency. Read on to understand the challenges and how we solved them.
Each time we write a query in ADH, we need to keep in mind the three privacy checks provided by ADH:
This helps with the immediate noticeable privacy concern in our queries even before we can run it using ADH UI or API (instant exception thrown).
Since Ads Data Hub writes the results to a BigQuery table, sometimes we may want to transform those results into a presentable format before sharing them with our teams/clients. This can be done by:
This is where Google Data Studio (GDS) comes in. It serves as a great data visualization tool and is part of the Google Marketing Platform (GMP). It helps in building interactive dashboards, and customized, beautiful reporting. We use GDS to track critical KPIs for clients, visualize trends, and compare performance over time.
Running an ADH query once & getting results in BigQuery
Figure 1: Ideal flow for data visualization from ADH
Figure 2: BQ Table for insights on line items (Job 1)
Let’s start by understanding a basic flow (Figure 1). Imagine you are an ad agency running a campaign for an advertising client, and their DV360 account is linked to your ADH account. For the campaign, you have created a single insertion order (IO) with a set of 5 line items and you want to get insights on all the 5 line items (Figure 2) by writing an ADH query. The ADH – DV360 data sources could be used as shown in Figure 3 to help create the required query.
Figure 3: ADH Query
It’s always a good practice to mention a Filtered Row Summary (FRS) before we execute our query. In this case, to specify the FRS, ‘line_item’ and ‘line_item_id’ can be kept as ‘ConstantConstant’, and ‘impressions’, ‘conversions’, and ‘media_cost’ can be kept as ‘SumSum’. Once your ADH query execution is completed and the data is dumped in BigQuery, it can be easily rendered in the GDS dashboard using the BigQuery connector.
This was quite straightforward as it was a one-time job, and if you are lucky none of the rows will get filtered out due to aggregation checks.
Running an ADH query with a refresh frequency & getting results in BigQuery
Figure 4: BQ Table for insights on line items (Job 2)
Now let’s consider another scenario where we need to schedule the previously explained ADH query and pull the latest data in the BigQuery table.
Imagine you run the same query the next day and you may get the result something similar to Figure 4. Now if we try to acquire insights using this BigQuery table, it will be quite misleading. What went wrong in your second run where three of the line items didn’t show up individually and instead were filtered?
This happens because ADH isn’t a straightforward data warehouse solution where the data could just be queried and retrieved. In the second job scenario, due to difference check, ADH may have detected some vulnerabilities on the level of individual users when comparing a job’s results to previous results, as there may be a large number of overlapping users, and so, a few line items get filtered out. Common blockers users face at this point:
We understand the limitations of retrieving data from the same ADH query by running it multiple times. So, what can be done to get near real-time data for insights on a live campaign with a refresh frequency of two or three days and get the most data out from the filtered row summary?
We tried solving this limitation by including a separate data flow architecture (Figure 5) in our workflow from the second run onwards.
Figure 6: BQ table insight on line items as per the new architecture
People may tend to ask, why not perform a union of the table from the current run (running the query for only fresh days instead of running it from Day 1) and the previous run and then the group at ‘line_item’ and ‘line_item_id’ level and simply add up all dimensions and recalculate the metrics?
The explanation is quite simple.
Let’s say ‘Line Item ID 3’ gets filtered in the second run. So that particular data will be part of the Filtered Row Summary. Then the same line-item reappears in the third run, following which you may just add up the dimensions like ‘impressions’, ‘conversions’, and ‘media_cost’. Now, this data will be misleading because the data for that line item doesn’t include figures from the second run. The idea of just grouping the data and summing up the dimensions is not a good one.
Figure 7: BQ table insights on date-level
Though the above argument for running the ADH query for the last one or two days instead of running it from Day one of the campaign may work better when we are trying to get insights for a particular date.
If we already have an ADH query and a master table with date level trends from 1st-3rd January 20211, but we want to get the data for the 4th & 5th January, then we could try running the ADH query for only those two days and then merge the result to the master table (Figure 7).
Then, the date level insights would also help us with some other level (eg: ‘day of week’ level) insights.
Figure 8: Data Studio Dashboard using the master table.
Using the above-suggested solution in all our future runs can make sure all the new rows are updated, and rows that aren’t updated will have the data from the previous runs. This gives us another insight into the rows, which aren’t updating for a long time that the line items may be performing poorly or may have been deactivated. The above flow will help in getting the most details by running the same ADH query with a particular frequency. Even if the rows are filtered out we would have a snapshot of old data to be shown in the dashboard.
The master table (Figure 6) which gets refreshed after each run can be used in the Data Studio dashboard (Figure 8) for tracking clients’ KPIs and visualizing trends. Each time the master table is refreshed we can refresh the dashboard to render the fresh data (highlighted in red). This will allow for important decisions to be made during the run of the campaign to optimize, improve performance, and, ultimately maximize profits.
Can you see yourself innovating in clean rooms like ADH for leading advertising brands? Check out our life at MiQ page.
Want more tech talk? Our Medium blog covers future-shaping data science and analytics innovation from the MiQ team.