Aggregating data - Qlik View

While QlikView shines in dealing with massive data volumes, sometimes we just do not need to load everything at an atomic level. Data aggregation can, for example, be used in deployments where document segmentation by detail is needed, in which case two documents are created to serve different user groups and analysis needs: one document will have all data with the highest level of detail and another one will have a similar data model but with aggregated (reduced) tables. This way, users are better served by keeping a balance between performance and analysis needs.

In this section, we will implement a document segmentation scenario by aggregating the Flight Data table to create a second document intended for executive users, who only require summary data.

Aggregating the Flight Data table

When aggregating data, the first step is always to define which dimension fields will be left out and which ones will be kept in the summarized table. We should analyze this question by looking at the data from the ground up, that is, by reviewing each dimension from the most granular to the most general. The following list shows the most important dimension fields in the Flight Data table, sorted by granularity:

  • Airport (Origin and Destination)
  • City
  • State
  • Country
  • Aircraft Type
  • Aircraft Group
  • Airline / Carrier
  • Carrier Group
  • Region
  • Month
  • Quarter
  • Year

If we analyze how removing each dimension would individually affect the result of the summarization process, we can find that the most impact would come from removing the Airport dimensions, both Origin and Destination, since those are the ones with the greatest granularity. At the same time, we can say that the Airport dimension does not add much value to the analyses we are looking to deliver in our document, so it's a good choice to leave it out.

Dropping dimensions from the data directly impacts the analyses that can be made in the resulting QlikView document. I Therefore, the decision to leave out certain fields for the sake of I summarization should always be discussed with the end user.

We could remove additional dimensions, for example, Aircraft Type or Carrier, but as we move up the detail ladder to the most general dimensions, those dimensions become more and more important to accomplish different analyses.

We must add that leaving dimensions out should be a thorough decision process, thinking both in terms of analytical requirements and the aggregation rate we can achieve. For example, removing the Country dimension would not result in any substantial aggregation if we keep the State field. Also, what happens if we remove the Airport dimensions but keep Origin City and Destination City? What happens is, not surprisingly, that the table will not be significantly reduced since both fields keep a close relation and their granularity is almost the same (there is only one airport in most cities). Therefore, and for the sake of simplicity, we will also leave out all city, state, and country fields.

Finally, before proceeding, we should keep in mind how many records the original table has, in order to be able to measure how much reduction we achieved in the summarization. In our case, the Flight Data table originally contains 1,256,075 rows.

Moving on to the aggregation process, follow these steps:

  1. Create a new QlikView document and save it inside the 2.Workbooks folder with the name Transform - Flight Data.qvw.
  2. Go to the Script Editor window, click on the Table Files, button in the tool pane and navigate to the 3.QVDSource folder.
  3. Select the Flight Data.qvd file and click Finish on the File Wizard window.
  4. From the generated Load script, find the lines corresponding to those fields related to origin and destination airports and erase them. The fields we should remove are:
    • %Origin Airport ID
    • %Origin Airport Sequence ID
    • %OriginAirport Market ID
    • %OriginWorld Area Code
    • %Destination Airport ID
    • %DestinationAirport Sequence ID
    • %DestinationAirportMarket ID
    • %DestinationWorld Area Code Distance
    • Origin Airport Code
    • Origin City
    • Origin State Code
    • Origin State FIPS
    • Origin State
    • Origin Country Code
    • Origin Country
    • Destination Airport Code, Destination City
    • Destination State Code
    • Destination State FIPS
    • Destination State
    • Destination Country Code Destination Country
    • From - To AirportCode
    • From - To AirportID
    • From - To City
    • From - To State Code
    • From - To State
  5. Next, from the list of fields we have kept, we need to identify those that are dimensions and those that are measures. Our measure fields are:
    • # Departures Scheduled
    • # Departures Performed
    • # Payload
    • #Available Seats
    • # Transported Passengers
    • # Transported Freight
    • # Transported Mail
    • # Ramp-To-Ramp Time
    • # Air Time
  6. The aggregation functions will be applied to these fields, that is, we will sum the # of Departures, or sum the # Transported Passengers. Identify where each of the listed fields are in the created load statement and replace the field name with the following expression:
    Sum(Field Name) as Field Name where Field Name represents each of the listed measures.
    Be careful not to remove the comma that separates eachIfield definition and remove the comma from the last listed field, before the From keyword.
  7. Finally, we will add a Group By clause to the end of the Load statement, and list all dimension fields that have been kept in the script, separated by a comma.
  8. We will also add a table name preceding the Load keyword.
  9. In the end, the aggregation script will look like this:
  10. Next, we will just save the changes and reload the script.The resulting table will turn our 1,256,075 rows into only 100,091. A brief example of what just happened is shown in the following screenshot:

    Aggregating-data-Figure

    Notice how the totals remain the same for both tables.

A smaller table will occupy fewer resources (RAM and CPU) and, therefore, calculations will be faster. If the performance gain attained with data aggregation doesn't mean reducing business value and/or functionality for the end user, then it's a winning approach any day.

The Transformation output

We have loaded the base QVD containing flight data and transformed it by applying aggregations, now what? Well, the next steps would be to store the transformed table, using the store command, into a new QVD file that will reside in the 3.QVDs Transformed folder.

After that, a new data model could be created in the Presentation Layer based on the Airline Operations document, but using the newly aggregated QVD and without the Origin and Destination dimensions. This new QlikView document is intended to serve the users who only need summarized information about the Airline Operations document.

Aggregation functions

Of course, QlikView offers more aggregation options than summing. The most commonly used options are shown in the following table:

Aggregation functions


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Qlik View Topics