Before you begin: In this document, only data sources are mentioned, but it applies to channels as well.
- "Auto" means that the calculation will behave based on the TapClicks default logic:
- Based on the calculation formula, the code decides automatically if the calculation has to be computed after the aggregation (thus in a "post-aggregation" phase) or before.
-
"Post Aggregation" = "On Final Data Set" / "On Initial Data Set" bypass the automatic behavior and let the user decides when the calculation is computed:
- "On Final Data Set": The calculation is computed post-aggregation.
- "On Initial Data Set": The calculation is computed pre-aggregation.
Example #1
To illustrate the operation of the data pipeline described above, let’s create a simplified example with this database:
We want to create a widget grouped by campaign name, and add a calculation that computes the CTR value (100*clicks/impressions). The standard SUM aggregate function is used when grouping clicks and impressions.
On Final Data Set Result
If the calculation is computed after the aggregation (i.e., a post-aggregation calculation), then data is first grouped like this:
Then the calculation is computed as follows:
On Initial Data Set Result
In this case, the calculation is computed for each database row (i.e., a pre-aggregation calculation):
In this case, the AVERAGE aggregate function is used when grouping calculation data and it gives this result:
The result is different, so the choice of computing the calculation before or after the aggregation matters. Here, obviously, the correct choice is "On Final Data Set" calculation.
Example #2
To measure the performance of a campaign, we would like to get the number of days where the number of clicks is > 100. We'll use the same database and the same widget configuration. This time, the calculation is defined using this formula:
IF (Clicks >= 100) THEN 1 ELSE 0 END |
The SUM aggregation function is used in this case. With the calculation computed on the final data set, we get this final result:
Definitely, this result is not expected. With the calculation computed on the initial data set, the calculation is first computed at the database row level:
Then, when aggregated, the final result is now correct: