Understanding Calculation Aggregation Options – TapClicks

Before you begin: In this document, only data sources are mentioned, but it applies to channels as well.

Calculating "On Initial Data Set" means that the calculation is computed for each row of the database, whereas calculating "On Final Data Set" means that it is computed on the already reduced data through the aggregate function. Depending on the calculation formula you have created, TapClicks intuitively determines where the calculation should be computed in the pipeline, either before or after the data is reduced. However, there are times where you may want to override the default behavior. This why the "Post Aggregation" option has been exposed when building a calculation. The options you can select are as follows:

"Auto" means that the calculation will behave based on the TapClicks default logic:
- Based on the calculation formula, the code decides automatically if the calculation has to be computed after the aggregation (thus in a "post-aggregation" phase) or before.
"Post Aggregation" = "On Final Data Set" / "On Initial Data Set" bypass the automatic behavior and let the user decides when the calculation is computed:
- "On Final Data Set": The calculation is computed post-aggregation.
- "On Initial Data Set": The calculation is computed pre-aggregation.

Note: If your calculation is defined already containing a custom aggregate function, the "Post Aggregation" option cannot be used.

Example #1

To illustrate the operation of the data pipeline described above, let’s create a simplified example with this database:

We want to create a widget grouped by campaign name, and add a calculation that computes the CTR value (100*clicks/impressions). The standard SUM aggregate function is used when grouping clicks and impressions.

On Final Data Set Result

If the calculation is computed after the aggregation (i.e., a post-aggregation calculation), then data is first grouped like this:

Then the calculation is computed as follows:

On Initial Data Set Result

In this case, the calculation is computed for each database row (i.e., a pre-aggregation calculation):

In this case, the AVERAGE aggregate function is used when grouping calculation data and it gives this result:

The result is different, so the choice of computing the calculation before or after the aggregation matters. Here, obviously, the correct choice is "On Final Data Set" calculation.

Example #2

To measure the performance of a campaign, we would like to get the number of days where the number of clicks is > 100. We'll use the same database and the same widget configuration. This time, the calculation is defined using this formula:

IF (Clicks >= 100) THEN 1 ELSE 0 END

The SUM aggregation function is used in this case. With the calculation computed on the final data set, we get this final result:

Definitely, this result is not expected. With the calculation computed on the initial data set, the calculation is first computed at the database row level:

Then, when aggregated, the final result is now correct:

Example #1

On Final Data Set Result

On Initial Data Set Result

Example #2

Related articles