Background and terms
An aggregation hierarchy is based on a set of dimensions typically defined as grouping attributes in a view definition. In general, the aggregation hierarchy includes all aggregation levels, which result from combination of different dimensions and the total (excluding all dimensions). Composite dimensions (as iDate composed of year, month, day) are typically used for dependent attributes, i.e month usually has no meaning without year. In order to explain, how aggregation works internally, some terms will be introduced:
- Aggregation instance ( I ) - An aggregation instance is an instance resulting from aggregation to a certain level. Aggregation instances consists of level identifier, dimension values (key), aggregated values and subset collections (relationships).
- Level identifier ( L ) - The level identifier represents an aggregation level and is an 8 byte number, where each byte represents the current dimension level for the aggregation level in order of defined grouping attributes. Because the level identifier is restricted to 8, ODABA supports maximum 8 dimensions.
- Dimension level ( L(i) ) - The dimension level is the aggregation level for the dimension i (i = 1 ... 8) in a level identifier. Higher dimension levels describe aggregations on lower dimension levels, dimension level 1 for date (year) represents a higher aggregation level as 2 (year, month) or 3 (year, month, day). The maximum dimension level supported is 9. Dimension levels are provided in the level identifier as readable characters (0 ... 9). Dimension level 0 indicates, that the dimension is excluded and has no value on this level (for this level identifier).
- Maximum dimension level ( Dim(i) ) - The maximum dimension level for dimension i is the number of attributes the dimension consists of. A valid dimension level L(i) is a value between 0 (dimension excluded) and Dim(i).
- Aggregation levels ( {L} ) - The set of all level identifiers contains a list of all (distinct) L, where L(i) is valid for i=1,...,8. Since each level identifier identifies exactly one aggregation level and since each aggregation level has exactly one level identifier, {L} defines the complete set of aggregation levels (level identifiers). The aggregation level example below shows all aggregation levels for the example.
- Level successor ( L+1 ) - The successor can be obtained by adding 1 to the level identifier. Considering three dimensions with maximum dimension level 1, we get following level successors beginning with total (000): {L} = { 000, 001, 002, 003, 010, 011, 012, 013, 100, ... }. Each L in {L} except the level identifier for the grouping level has exactly one successor. The successor is a potential aggregation level for aggregating a level L+1 to L, but not the only one.
- Level distance ( D(L1,L2) ) -The level distance between two levels L1 and L2 is the number of necessary aggregation steps between from lower level L2 to higher level L1. D(L1,L2) is -1, when there exist at least one i with L2(i) < L1(i) (e.g. L1=010 and L2=101). Otherwise, D(L1,L2) = sum(L2(i) - L1(i): i=1,...,8). When D(L1,L2) is -1, L1 cannot be aggregated from L2.
- Level class number ( C(L) ) - The level class number for a level identifier is the sum of dimension levels: C(L) = sum(L(i): i=1,...,8). The level class number for total is 0. When D(L1,L2) is greater than 0, D(L1,L2)=C(L2) - C(L1), i.e. the difference between level class numbers of level identifiers in different level classes defines the minimum number of aggregation steps between level identifiers of these classes.
- Level instance count ( I(L) ) - The level instance count is the number of instances stored for a certain level identifier. Instances for a level identifier are all instances on the aggregation level defined by the level identifier. When the level has not yet been evaluated, I(L) is -1.
- Key value ( K ) - The aggregation key value consists of attribute values for all dimensions. Attribute values for excluded dimensions are empty. K provides a unique key for all instances in an aggregation collection, which contains aggregation instances on all levels. Level identifier and key value together provide a unique key in the aggregation collection.
- Key component value ( K(i) ) - A key component value is the part of the key value for dimension i. When the dimension level L(i) is 0, K(i) is considered as empty attribute value.
- Subset ( S(I,i) ) - The subset of aggregation instances for an instance I with dimension level L and L(i) < Dim(i) is the collection of all aggregation instances in the aggregation collection on next lower aggregation level L1, where L1(j) = L(j) and K(j) = K1(j) for all i <> j and L1(i) = L(i)+1.
This notations will be used in subsequent topics to describe the aggregation algorithms in ODABA.