Aggregation instances
The data type of aggregation instances includes all view dimensions and attributes as well as several implicit properties:
- level identifier - the level identifier ( L ) defines the aggregation level the instance belongs to. The level identifier is a character array (char[8]) with the name level_id.
- dimensions - Level dimension attributes ( K ) as being defined in the grouping clause.
- view properties - persistent and transient view properties as being defined in the SELECT statement or VIEW definition.
- partitions - collections (relationships) referring to subset of next lower level instances ( S(I,i) ). Partition names are derived from dimension attributes preceding attribute names with __ (e.g. __personId, __projectId, __iDate in the example)
Aggregation attributes are level identifier, dimension attributes, aggregation attributes and transient properties. Dimension attributes are all grouping attributes defined in the VIEW definition or SELECT. When no group clause has been defined, key components of the order key are supposed to define dimension attributes. Aggregation attributes listed in a VIEW definition's attribute list or in a SELECT clause are, typically, calculated by means of aggregation functions. Finally, attribute lists may contain a number of transient attributes, which are not result of aggregation functions.
Each aggregation instance is extended by a level identifier (char __level_id[8]) array, which defines the dimension levels for the grouping instance (the maximum number of dimensions is 8). Each element contains the dimension level for the corresponding dimension attribute in the hierarchy as character value. Dimension level 0 indicates that this dimension has reached its top aggregation level (dimension excluded) and key component values for this dimension will be ignored.
The "total" aggregation instance, which does not contain dimension attribute values (dimension key components are all excluded) has the character value 00000000. When, e.g. defining grouping attributes personId, projectId and iDate for time sheets, the aggregation on personId level gets the level identifier 10000000. Level personId|iDate(year,month,day) gets identifier 10300000 etc. In case of complex dimension attributes, the dimension level may also contain higher values than 1 (maximum is the number of attributes in the complex dimension). The maximum dimension level number is 9.
Dimension attributes K(i) with maximum level number (L(i) = Dim(i)) in the level identifier refer to non-aggregated dimensions (e.g. 10300000 refers to personId|(year,month,day) while 10100000 refers to personId|year). Dimension attributes with level less than Dim(i) refer to aggregated dimension levels, that may be aggregated further on. Dimension attributes with level number 0 in the level identifier refer to final aggregated dimensions.
In order to get aggregation instances for a certain aggregation level, a key filter condition (fixing level identifierkey component) may be set with the level identifier.
Level identifier conversion and naming functions are provided in order to simplify usage of level identifiers.
In an aggregation schema, all dimension attributes become key component attributes of the aggregation data type. The primary key for the aggregation data type consists of level identifier and dimension attributes. The sequence of dimension attributes defines the order key as long as no different sort order has been defined. Finally the sequence of the order key define the position of dimension level in the level identifier.
Since the attribute check is performed by name, the aggregation attribute list must not contain dimension attribute names, because than the dimension attribute will be added with its original dimension name, once more.
Aggregation values are, typically, attributes or derived attributes from the source data type. When defining attributes, those might be referenced directly in the data source (like time or time=sum(duration)) or by referring to an OSI expression or function. In an aggregation schema, aggregation values usually refer to a data source that defines the aggregation rule. Default aggregation rules that may be referenced in an aggregation statement (SELECT()) or aggregation schema are:
- minimum
- maximum
- sum
- average (should not be used)€
- variance (should not be used)
- deviation (should not be used)
- statistic (provides count, sum and square sum)
Same aggregation functions used for calculating view attributes (low level aggregation) are also used for calculating aggregation attributes on higher levels. Since higher level aggregations are based on aggregated values, one should not use average(), variance() or deviation() as aggregation functions in an aggregation schema. Instead, the statistic() function may be used, which provides average, variance and standard deviation as derived values.
Besides, one may provide application specific (user-defined) aggregation rules (see "Aggregation functions"). Data created by user-defined aggregation functions must be aggregate-able, since functions also apply on higher level aggregations.
Usually, aggregation functions need a preceding collection name referring to the collection of instances to be aggregated. In order to calculate the aggregated value the function iterates through the collection (calling object). The parameter for the aggregation function is the attribute to be aggregated. The calling object (source collection) for an aggregation functions is determined by the aggregate operation, i.e. in an aggregation schema, no calling collection is required for aggregation functions. When a specific source collection had been defined in order to execute the grouping operation, this will be ignored for higher level aggregation.
All non-transient aggregation values that do not have got a source (assignment) are supposed to be evaluated by the Aggregate() function implemented for the aggregation model data type. When no Aggregate function has been implemented, the default Aggregate() function is called, which applies the sum() function on numerical attributes and the maximum() function on all other data types. The Aggregate() function is called after evaluating assigned attributes.
// use of statistic function
SDB_Statistic stat = statistic(income);
TRANSIENT INT(10,2) avr = stat.Average();
TRANSIENT INT(10,4) dev = stat.Deviation();
Transient properties are evaluated by an assigned operand or by means of context functions (read event handler). Operands assigned to transient properties in an aggregation (VIEW) schema or SELECT statement must be valid in the context of the result data type. Transient attributes with assigned operands are evaluated when accessing these attributes, i.e. after aggregations have been completed.
In order to be able to traverse aggregation hierarchies and for calculating higher aggregation levels, each instance in the aggregation collection provides a number of partition relationships. The number of partition relationships corresponds to the number of dimension attributes in grouping definition.
Names for partition relationships are derived from dimension attribute names preceding attribute names with __ (e.g. __personId, __projectId, __iDate in the example).
In order to provide links to related object instances (e.g. person or project in the example), one may also define transient references, which must be maintained in an appropriate read or update handler.