Aggregations

Aggregations enable you to accumulate data for statistical analysis. The aggregation is calculated at runtime, so that post-processing isn't required and processing is highly efficient and accurate. Aggregations function similarly to associative arrays, but are populated by aggregating functions. In D, the syntax for an aggregation is as follows:

@name[ keys ] = aggfunc( args );

The aggregation name is a D identifier that's prefixed with the special character @. All aggregations that are named in D programs are global variables. Aggregations can't have thread-local or clause-local scope. The aggregation names are kept in an identifier namespace that's separate from other D global variables. If you reuse names, remember that a and @a are not the same variable. The special aggregation name @ can be used to name an anonymous aggregation in D programs. The D compiler treats this name as an alias for the aggregation name @_.

Aggregations can be regular or indexed. Indexed aggregations use keys, where keys are a comma-separated list of D expressions, similar to the tuples of expressions used for associative arrays. Regular aggregations are treated similarly to indexed aggregations, but don't use keys for indexing.

The aggfunc is one of the DTrace aggregating functions, and args is a comma-separated list of arguments appropriate to that function. Most aggregating functions take a single argument that represents the new datum.

Aggregation Functions

The following functions are aggregating functions that can be used in a program to collect data and present it in a meaningful way.

  • avg: Stores the arithmetic average of the specified expressions in an aggregation.

  • count: Stores an incremented count value in an aggregation.

  • max: Stores the largest value among the specified expressions in an aggregation.

  • min: Stores the smallest value among the specified expressions in an aggregation.

  • sum: Stores the total value of the specified expression in an aggregation.

  • stddev: Stores the standard deviation of the specified expressions in an aggregation.

  • quantize: Stores a power-of-two frequency distribution of the values of the specified expressions in an aggregation. An optional increment can be specified.

  • lquantize: Stores the linear frequency distribution of the values of the specified expressions, sized by the specified range, in an aggregation.

  • llquantize: Stores the log-linear frequency distribution in an aggregation.

Printing Aggregations

By default, several aggregations are displayed in the order in which they're introduced in the D program. You can override this behavior by using the printa function to print the aggregations. The printa function also lets you precisely format the aggregation data by using a format string.

If an aggregation isn't formatted with a printa statement in a D program, the dtrace command snapshots the aggregation data and prints the results after tracing has completed, using the default aggregation format. If an aggregation is formatted with a printa statement, the default behavior is disabled. You can achieve the same results by adding the printa(@aggregation-name) statement to an END probe clause in a program.

The default output format for the avg, count, min, max, stddev, and sum aggregating functions displays an integer decimal value corresponding to the aggregated value for each tuple. The default output format for the quantize, lquantize, and llquantize aggregating functions displays an ASCII histogram with the results. Aggregation tuples are printed as though trace had been applied to each tuple element.

Data Normalization

When aggregating data over some period, you might want to normalize the data based on some constant factor. This technique lets you compare disjointed data more easily. For example, when aggregating system calls, you might want to output system calls as a per-second rate instead of as an absolute value over the course of the run. The DTrace normalize function lets you normalize data in this way. The parameters to normalize are an aggregation and a normalization factor. The output of the aggregation shows each value divided by the normalization factor.