Design challenges of data modeling - Qlik View

We will now provide some useful tips for dealing with more complex data models, specifically those used for dealing with multiple fact tables in a single QlikView document.

We will take a hands-on approach and we will continue using the Airline Operations document that we have previously used. The design challenge we will describe in this section will be aimed at integrating a new fact table into our data model, one that contains Airline Employment Statistics.

The Airline Employment statistics table

The table we will be adding to the data model contains monthly information about the number of employees per airline, separated by full time and part time, and also showing the equivalent total of full time employees.

The table contains the following fields:

  • Year
  • Month
  • %Airline ID
  • %Unique Carrier Code
  • Unique Carrier
  • Carrier Code
  • Carrier Name
  • %Carrier Group ID
  • # FullTime Employees
  • # PartTime Employees
  • # Equivalent FTEs
  • # Total Employees
  • Period
  • Month (#)

As you can see, all of the dimension fields are already part of the Airline Operations data model and the only new fields are those related to the actual measurements of employment.
Integrating multiple fact tables into one data model is one of the main challenges we can come across when designing a data model. This is, in fact, a very common scenario and we will present two ways of solving the task at hand:

  1. By concatenating the two fact tables into one.
  2. By creating a link table.

Let's see how each of these methods work and how the integration of this table will take place.

Concatenating fact tables

The first approach suggests that we should combine the two fact tables into one.
This is a valid approach as all of the dimension fields of the new table are also present in the initial table.

We have already used the concatenate function in the previous example, and we will use it again to combine both fact tables.

This method will keep our data model simplified because there will be virtually no additional tables in it. However, we should keep in mind one important consideration: structural asymmetry.

Structural asymmetry

Although it is true that all of the dimension fields contained in the Employment table are present in the Main Data table, the opposite is not true: not all of the dimension fields in the Main Data table are present in the Employment table. There is an asymmetric structure between them.

This structural asymmetry needs to be kept in mind when creating the frontend of the document, simply because there will be analyses across certain dimensions that will be impossible to make. For example, we will not be able to create a chart that shows the number of employees by airport, as that dimension (airport) is not present on the Employment table, and there is no way to get that data into our document. However, we will be able to create, for instance, a chart that shows the number of employees by airline or by month or by year.

In hand with this difference in table structure, there is another point we should address: the Main Data table already contains Year, Month, and Quarter dimensions, and those dimensions are included as listboxes in the user interface to allow the user to filter through the data. However, the Employment data only contains the Year and Month fields, but not the Quarter field.

There is something we can do about this: simply add a calculated field to the Employment table before concatenating it to the Main Data table. In the end, both table segments contain the Quarter field. If we do not add the Quarter field to the Employment table, any user selection on this field will automatically exclude all of the employment data.

Natural and forced concatenation

Moving on to concatenating tables, as we saw in the previous chapter, there are two ways in which this operation can take place in QlikView:

  • Natural concatenation
  • Forced concatenation

Let's take a moment to revisit this subject and see how we can apply it to our fact tables.

Natural concatenation

Natural concatenation happens when two tables are loaded with exactly the same structure, that is, they contain the exact same fields (in both the number of fields and field names).

When this condition is met, QlikView automatically combines all tables that are similar and treats them as one logical table.

An example of this is shown in the next script:

As you can see, with the preceding script we are loading two tables into the data model and both tables are loaded from a different QVD file. The first table contains data for the year 2011, while the second table contains data for 2010. As they both have the same structure (field names), they will automatically be merged into one logical table in the QlikView data model. Also note that the order in which the fields are defined in the load script is not relevant for natural concatenation.

No Concatenate

In case we want to avoid the default behavior whenever this circumstance is present in the script, we can add the No Concatenate keyword as a prefix to the Load statement of the second table so that QlikView continues treating them as separate tables in the data model.

Forced concatenation

Forced concatenation happens when we explicitly define that two tables should be combined into one logical table in the data model, even if they don't have the same structure or field names.

This is the method we have used earlier in this chapter, where we used the Concatenate prefix to add the 'unknown' Aircrafts to the Aircrafts dimension table. It will be useful in this case as well as there are only a few shared fields between both tables.

As we also saw earlier, with the Concatenate prefix we can specify to which of the previously loaded tables the new table should be appended. This is done by adding the name assigned to the target table, enclosed by parentheses.

The following script shows how to explicitly concatenate two tables that do not have the same structure:

Sales:LoadRegion,Month,Year,[Total Amount]From Sales2 011.qvd (qvd);Concatenate (Sales)LoadRegion,[Total Budget Amount],Month,YearFrom Budget2011.qvd (qvd);

If we don't add the name of the original table (Sales, in the preceding example) to the Concatenate prefix, the new table will be concatenated to the table loaded immediately before it, no matter what that table is.

We recommend, as a best practice, to always explicitly define concatenation by adding the name of the target table to the Concatenate prefix, even if both tables have the same structure and would naturally be combined. This is mainly to avoid confusion and makes it easier for other developers, and yourself, to understand the script.

Concatenating the Employment Statistics table

Now that we've revisited the subject of table concatenation and described the considerations that we must keep in mind, let's put it into practice. We will be integrating the Employment table into the already designed data model for the Airline Operations document.
Follow these steps:

  1. Make sure the T_F41SCHEDULE_P1A_EMP.qvd file is placed into the Airline OperationsData FilesQVDs folder.
  2. Open the Airline Operations.qvw document we've been working with.
  3. Save the file with another name. Let's call it Chapter (Data Modeling Best Practices)_Concatenated tables.qvw.
  4. Go to the Edit Script window, activate the Main Data tab, and click on the Add new tab button from the toolbar.
  5. The Tab Rename dialog window will appear, in which we will enter Employment Data and click on OK.
  6. The new tab will be added to the right of the Main Data tab, which is particularly important for our example.
  7. Using the File Wizard dialog (click on the Table Files, button), create the Load statement for the T_F41SCHEDULE_P1A_EMP.qvd file. Make sure the Load statement is added on the Employment Data tab created previously.
  8. Add the new Quarter field as a calculated field to address part of what we discussed about structural asymmetry. The expression we will use for this is:
    'Q' &Ceil([Month (#)]/3, 1) as Quarter
  9. Add the Concatenate prefix to the load statement, specifying that the Main Data table is the one to which this will be appended. The added script should look as follows:
    It is of fundamental importance that this script be added in a tab that is to the right of the Main Data tab, as we are referencing the Main Data table and. For that to work, the table must have been loaded previously, during the script execution. Remember that script executes sequentially from left to right.
  10. Save the changes and reload the script.

As no actual tables have been added to the data model, the resulting model will look identical to the one we had before adding the script. The only difference will be the new fields that are included at the end of the Main Data table that correspond to the employment measures. Use the Table Viewer dialog to verify that the new fields have been added.

We have described the first approach for dealing with multiple fact tables in a data model. In the next section, we will present yet another option along with its pros and cons, so that you, as a Developer, can better decide which one will suit your needs best.

To continue, save and close the Chapter (Data Modeling Best Practices) _ Concatenated tables.qvw document.

Working with link tables

When we include two or more fact tables in a single QlikView document, it's very likely they all are somehow related and will, therefore, have some common dimension fields among them. However, as we've outlined before, in a QlikView data model, two tables should not be associated through two or more fields because it would generate a synthetic key.

So how do we incorporate two or more fact tables into one data model and treat them as two separate logical tables while, at the same time, avoiding the synthetic- key issue? At first sight, it can seem like both options are mutually exclusive, but there is a workaround which is to create a Link Table.

As its name implies, a link table essentially "links" two or more fact tables by taking all common fields out of the original tables and placing them into a new one (the link table).The new link table contains all possible combination of values for that set of fields and, through a unique key, is associated to the original tables.

A link table example

Take, for example, the following scenario:

  • We are required to design a data model for analyzing Call Center Performance data, and have two fact tables: Operations and Payroll. Based on these tables, we need to be able to present cross-functional information in a QlikView dashboard.
  • The Operations table has the following fields: Call ID, Timestamp, Employee ID, Supervisor ID, Department ID, Call Type ID, Customer ID, Call Duration, and Total Hold Time.
  • The Payroll table has the following fields: Payroll ID, Employee ID, Department ID, Position ID, Amount.
  • We also have the corresponding dimension tables to provide a description to the fields Call Type ID, Employee ID, Department ID, and Position ID.

If we let QlikView make the default associations, we would get the following data model:

A link table example

From the preceding image, we can observe that a synthetic key has been created because two fields (Employee id and Department id) are shared between the Operations table and the Payroll table.

To solve this challenge, we will remove the synthetic key by using a link table. As was mentioned earlier, the new link table will hold all combinations of the key fields that are common for both tables. We should also create a new compound key to connect the three tables.

Essentially, the link table replaces the synthetic key table, and in some cases, the result in both performance and design is exactly the same. However, it's a good practice to always "clean" the data model and remove all synthetic keys.

Our re-designed data model will look as follows, after applying the appropriate changes:

Design-challenges-of-data-modeling

As you can see, the synthetic key has been removed and the Payroll and Operations tables are now connected to the Department and Employees dimension tables indirectly and through the link table.

When designing a data model using this method, we should always consider the following:

  • The link table can become very large depending on the number of possible combinations that exist between or among the fields it's composed of.
  • We must make sure that all of the combinations that exist in both fact tables are also in the created link table. If they are not, the association between the fact tables and the link tables might be "broken" or missing for some records.
  • For performance optimization, the link table should not have repeated records. Only one record per possible combination of values is needed.
  • If the link table becomes immense, remember that QlikView needs to "walk" through the defined path between tables each time it needs to look for associations. An additional "hop" on this path, especially if it is through a large table, might slow calculation times for the end user.

Creating a link table in the Airline Operations document

Now that we've described the creation of link tables and its uses, let's put it into practice in our Airline Operations data model.

Remember that, in the previous section, we already added the Employment Statistics table to the Airline Operations data model by using the Concatenation method. We'll now do the same, but this time using the link table method. You can compare each of them and see for yourself their pros and cons.

Follow these steps:

  1. Make sure the T_F41SCHEDULE_P1A_EMP.qvd file we used previously is already in the Airline OperationsData FilesQVDs folder.
  2. Open the Airline Operations.qvw document.
  3. Save the file with another name. Let's call it Chapter (Data Modeling Best Practices)_Link tables.qvw.
  4. Head on to the script editor and add a new tab to the right-side of the Main Data tab. Name the new tab as Employment Data.
  5. Using the File Wizard (click on the Table Files... button), create the Load statement for the T_F41SCHEDULE_P1A_EMP.qvd file. Name the table as
    Employment Statistics.
  6. Identify the common fields between the Employment Statistics table and the already loaded Main Data table. The shared fields are:
    • Year, Period, Month, Month (#), %Airline ID, %Unique Carrier Code, Unique Carrier, Carrier Code, Carrier Name, and %Carrier Group ID.
  7. For all of the shared fields listed above, identify those that will form a unique key. In this case, the fields that must be included in the unique key are:
    • Period, %Airline ID, %Unique Carrier Code, and %Carrier Group ID.
  8. The fields we will leave out of the key will be:
    • Year, Month, Month (#), Unique Carrier, Carrier Code, and Carrier Name.
  9. Create a new compound key on both of the tables by using the following script expression:
    Period& ' | ' & [%Airline ID]& ' | ' & [%Unique Carrier Code]& ' | ' & [%Carrier Group ID] as [%Key Field],
  10. Adda new tab from the Tab menu and name it Link Table. Make sure this new tab is located to the right-side of both the Employment Data and Main Data tabs.
  11. In the Link Table tab, add the following script:
  12. With the preceding script we are doing the following:

1. Assign a name to the table: Link Table.

2. Create a list of distinct combinations of all shared fields from the previously loaded Main Data table, including the new %Key Field, by performing a Resident Load (more on Resident Loads in Chapter Advanced Data Transformation).

3. Create a duplicate of the %Key Field attribute and name it %temp Key Field.

4. Concatenate a new list of distinct combinations of all shared fields from the previously created Employment Statistics table, including the new %Key Field attribute and adding a calculated field Quarter. From this new list we exclude all combinations that already exist on the first list earlier, using a Where clause.

5. Remove the %temp Key Field field from the data model, as it was only to be used in the Where clause.

6. Remove the shared fields from each fact table, except the %Key Field attribute, as they will now be stored in the link table.

13. Save the changes we made and reload the script. The new data model will be created and look as follows:

Design-challenges-of-data-modeling-Figure

Two fundamental recommendations can be made regarding the creation of link tables:

  1. Always use the Distinct keyword when creating the link table. This is done so we are only loading dimension fields into this table and we should not have duplicate records.
  2. If the Key field is composed of several individual fields, it can be resource intensive for the application. In these cases, we could use the Autonumber() function described previously.

Proceed to save and close the document.

Finding a balance

We have outlined two ways for dealing with multiple fact tables in a data model. If we were to ask which of these methods is better, we would need to say, again, "it depends." There is no definite best, and the decision about which to use will depend entirely on the specific scenario where it has to be implemented.

When deciding on data model design, you should always ask yourself if the "structural asymmetry" we talked about before is something you could accept in your data model, or if the asymmetry is highly significant and therefore you would prefer to live with the "additional hop" in the data model.

Choosing between approaches can also impact the application's size. Take a look at both of the files created using each of the preceding methods and see which one is smaller in disk size .Can you guess why the link table approach produces a smaller file? It's because we've taken out some fields from the relatively large fact tables and placed them into a smaller link table.

For QlikView, both of these approaches are OK. They work as they should and there should not be calculation differences resulting from using them one over the other.

For the purpose of continuity throughout the rest of the book, we will be working with the new data model created using the first method: Concatenation. Therefore, make sure to integrate the Employment Statistics table to the original Airline Operations document by concatenating both fact tables as previously shown.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Qlik View Topics