Basics of reporting.2. Report groups

Reports used in the real-world-business are more than simple list of rows and columns. To gain information from the presented data, the report data is sorted and condensed into separate sections, which corespond to the logical structure of the report. Therefore report engines introduced the concept of groups to support the structuring of report data.

With groups, a set of rows, the report data set, is subdivided into an ordered collection of smaller subsets of rows. All rows of a group instance share a common attribute. In the domain of relational databases, the attribute is defined by the name of one or more columns. Within the group instance, all those columns have the same value. That set of attributes, which identifies a group, is called the group key.

Groups in the reporting usually can be mapped to 'is-part-of' relations in the real world. Employees could be grouped by the department, where they work in, or by the first letter of their last name.

Basics of reporting.2.1. The Control-Break-Algorithm

Reporting can be a very resource intenstive process. Data has to be queried from databases and other sources, the data must be processed, sorted and intermediate results must be computed and finally the report output must be generated. Even with today's powerfull computers, performance is a key to customer satisfaction. To fullfill these performance requirements, it is unacceptable to waste time building a global view on the report data unless absolutly necessary.

For performance reasons, group memberships are computed at runtime with an algorithm called the 'control-break algorithm'. Understanding this algorithm is the key to successful reporting, as this algorithm heavily influences how the report engine behaves and how groups are built up. The control-break-algorithm does not build a global view over all group instances. To detect the end or start of a group, it compares the current row with the previously read row of data. If the data of one of the group's key columns is different, the current group must be finished. If the report processing has not reached the last row, a new group instance is opened immediatly. This new group will remain active until either the end of the report data has been reached or the group key values changed.

Definition:

A group is a contigiuous set of rows, where all rows of the group share the same group key.

A group instance is identified by a group key. The key consists of one or more attributes, where all entites in that group have the same values stored in the specified attributes. When using databases, entity sets are usually represented by tables and attributes are represented by the table's columns.

A sub group is a group, which has a key that contains all attributes of its parent and has at least one more attribute. Therefore, a sub group contains a subset of it's parent's rows.

A group with a group key without attributes defined is called the default group. This group has a single instance which always spans the complete report data set.

The single level Control-Break-Algorithm as pseudo code:

OPEN data-set.
READ record.

PRINT report-header.

WHILE NOT END-OF-FILE 
DO
   VAR groupKey := GET-CURRENT-GROUP-KEY.
   PRINT group-header.

   WHILE (NOT END-OF-FILE AND
          groupKey IS-EQUAL-TO GET-CURRENT-GROUP-KEY)
   DO
      PRINT item-band.
      READ
   DONE.

   PRINT group-footer.

DONE

PRINT report-footer.
CLOSE data-set.

and in Java (the complete code for this listing can be found is in CVS):

  public static void main (final String[] args)
  {
    final Object[][] data = initData();

    // cursor at the first row ...
    // in this example modifying the 'position' is equal to a 'read' operation 
    int position = 0;

    printReportHeader(data, position);
    while (isEndOfFile (data, position) == false)
    {
      // initialize the group key with the data from the first row
      // remember the current group key
      // if the group key changes, we have to do a control break ...
      final Object groupKey = data[position][CONTINENT_COL];
      printGroupHeader(data, position);

      while ((isEndOfFile (data, position) == false) &&
             (data[position][CONTINENT_COL].equals(groupKey)))
      {
        // store the last value of the group key (so that we can detect changes)

        // print the items ...
        printItems (data, position);

        // now 'read' the next row of data ...
        position += 1;
      }
      printGroupFooter(data, position - 1);
    }

    printReportFooter(data, position - 1);
  }

From this algorithm, we can derive some basic principles about groups as they are used in JFreeReport.

  1. Data used in group keys must be sorted according to the group hierarchy definition.

    As only neighbouring rows are compared against each other, this algorithm will consider rows to be part of the currrent group if and only if all rows of a particular group key instance are kept together as direct neigbours in the report dataset.

  2. A group with no attributes defined will have a single instance, which spans the complete report data set.

  3. As long as there is data available, a new group instance will be opened immediatly after the previous instance has been closed down. It is not possible to print the item band without having an open group.

  4. The order of the attribute specification within a group definition is not important. For the algorithm it is important, that one attribute value have changed, it is not important, which one changed.

By considering the group data to be a new data set, we can stack multiple control-break runs into each other. These multi-level control breaks add some new behavioural constraints to the engine.

  1. A sub group can only be opened, after its parent group has been opened.

    Subgroup processing starts as soon as the group header of the parent group has been printed. Immediatly after the processing finishes, the parent group's footer gets printed. The processing flow cannot reach the subgroup without passing through the parent group.

  2. A sub group must cease control as soon as one the parent's group attributes changes.

    A control break in one of the parent groups will close all subgroups. As soon as the parent group generated a new group instance, new instances of the sub groups will be opened as well.

    Therefore sub groups must be aware of the group attributes of it's parents.

  3. Adding an attribute, which has a constant value, to the group definition will not alter the number or order of the generated group instances.

  4. A group can only have one directly attached sub group.

Basics of reporting.2.2. Working with groups: Examples

The effects of the control break algorithm are best unterstood when looking at an example. Lets take the following table as datasource.

Table Basics of reporting-1. Initial data set

LocationDepartmentEmployeeSalary
DenverSalesJohn Doe100.000
DenverSalesJane Doe100.000
DenverMarketingArthur Dent125.000
New YorkMarketingAdam Johnson125.000
New YorkMarketingEve Lynn145.000
New YorkManagementJ.D. Salinger500.000
When grouping the table by the 'department' column, we'll get the following three group instances.

Table Basics of reporting-2. Data set grouped by 'Department'

LocationDepartmentEmployeeSalaryNotes
 group start for 'department group'
DenverSalesJohn Doe100.000 
DenverSalesJane Doe100.000 
 group end for 'department group'
 group start for 'department group'
DenverMarketingArthur Dent125.000 
New YorkMarketingAdam Johnson125.000 
New YorkMarketingEve Lynn145.000 
 group end for 'department group'
 group start for 'department group'
New YorkManagementJ.D. Salinger500.000 
 group end for 'department group'
Within each group instance, the value of the 'department' column is the same for all rows of that group. We get a group instance for each department type. Note, that the department data is sorted.

Now, lets use the 'location' column as group key. We'll receive two group instances now, 'Denver' and 'New York'.

Table Basics of reporting-3. Data set grouped by 'Location'

LocationDepartmentEmployeeSalaryNotes
 group start for 'location group'
DenverSalesJohn Doe100.000 
DenverSalesJane Doe100.000 
DenverMarketingArthur Dent125.000 
 group end for 'location group'
 group start for 'location group'
New YorkMarketingAdam Johnson125.000 
New YorkMarketingEve Lynn145.000 
New YorkManagementJ.D. Salinger500.000 
 group end for 'location group'
Of course, we can combine groups to create multi-level reports. The control break algorithm allows only one group definition per level. That means, we cannot have a report that has top level groupings for 'location' and 'department' at the same time. But we are able to subdivide groups in an ordered way.

For example, we can first group by the 'location' and in a second step group all locations by it's department. The first level grouping will produce the following layout:

Table Basics of reporting-4. Data set grouped by 'Location' and subgrouped by 'department'

LocationDepartmentEmployeeSalaryNotes
 group start for 'location group'
 group start for 'department group'
DenverSalesJohn Doe100.000 
DenverSalesJane Doe100.000 
 group end for 'department group'
 group start for 'department group'
DenverMarketingArthur Dent125.000 
 group end for 'department group'
 group end for 'location group'
 group start for 'location group'
 group start for 'department group'
New YorkMarketingAdam Johnson125.000 
New YorkMarketingEve Lynn145.000 
 group end for 'department group'
 group start for 'department group'
New YorkManagementJ.D. Salinger500.000 
 group end for 'department group'
 group end for 'location group'

The order of the groups is important. There's a difference, whether one first groups by the location and then by the department column or first by department and then location. As you can see, a subgroup is always part of the parent group. When a subgroup's headers are printed, the parent group's headers are already fully processed. For footers, we can see, that the footer of the subgroup is always printed before the parent group's footer gets on the paper.

Now we switch the group order, the department group is the top level group, followed by the 'location' subgroup.

Table Basics of reporting-5. Data set grouped by 'department' and subgrouped by 'location'

LocationDepartmentEmployeeSalaryNotes
 group start for 'department group'
 group start for 'location group'
DenverSalesJohn Doe100.000 
DenverSalesJane Doe100.000 
 group end for 'location group'
 group end for 'department group'
 group start for 'department group'
 group start for 'location group'
DenverMarketingArthur Dent125.000 
 group end for 'location group'
 group start for 'location group'
New YorkMarketingAdam Johnson125.000 
New YorkMarketingEve Lynn145.000 
 group end for 'location group'
 group end for 'department group'
 group start for 'department group'
 group start for 'location group'
New YorkManagementJ.D. Salinger500.000 
 group end for 'location group'
 group end for 'department group'
As we can see, whenever the department changes, the location group was closed down and reopened once the department group generated a new instance.