The first set of best practices that we present on data modeling are those related to data consistency. This is one of the most important things we need to take care of when building QlikView documents. Let's look at some best practices that we can use to assure our data is concise and consistent.
Dealing with dimensions without facts
Sometimes, a dimension table can contain values that do not have any associated facts. To demonstrate this, let's take a second look at the data model we built in Chapter Data Sources, and have been using ever since:
We learned earlier what this means: of all the distinct possible values for %Aircraft Type id, 100% of those values appear in the Aircraft Types dimension table, while only 48% of the values appear in the Main Data table. In other words, only 48% of aircrafts have actually made any flights. Before we look at how to remove these aircraft types from the model, let's first quickly investigate which aircraft types have not logged any flights:
By looking at the Flights table box, shown in the following screenshot, we can see that the selected aircrafts indeed have no flight data associated with them:
Of course, it can be very useful for a business analyst to see which dimensions do not have any fact data associated with them. For that reason, it may be worthwhile to keep this information in the model. Whenever these types of issues present themselves, it is important to check with the business users what their wishes are.
For our example, we will remove the aircraft types that do not have any associated flight data. To do that, follow these steps:
The non-matching aircrafts are no longer in the data model after the reload. The code that we added to the script uses a where clause combined with the Exists() function. We are essentially filtering out any records in which the AC_TYPEID field from the dimension table does not have a corresponding value in the %Aircraft Type id field already loaded in the Main Data table.
The Exists() function takes two parameters:WHERE Exists([%Aircraft Type ID], AC_TYPEID);
The first parameter specifies the field on which we need to check to see if there are any occurrences of the values contained in the second field, the one specified in the second parameter.
In some cases, the two fields being compared have the same name in both the input dimension table and the fact table already loaded. If that's the case, we could use a simplified, one-parameter, syntax as follows:Where Exists([%Aircraft Type ID]);
Depending on how the field names from the input table are defined, we should use the appropriate syntax from the two presented above. The main advantage of the second scenario (one-parameter syntax) is that, when loading from a QVD, it will still perform as an optimized load, while the first scenario will not.
An alternative to using the Exists() function is the use of the keep prefix, which will be added before the LOAD keyword. As shown in the previous chapter, by using left keep or right keep, we can limit the records being loaded to those that have a matching key in the already loaded fact table. A benefit of using this prefix is that the result set can be limited on multiple fields, while the Exists() function can only use a single field. However, script processing of the keep prefix can be a lot slower on larger data sets, so the Exists() function is the preferred method when ever possible.
Let's take another approach to dealing with this problem this time using the keep prefix.
An alternative approach
The previous example depends on the fact table being loaded before the dimension tables. It often makes more sense to load dimension tables first and fact tables later.
In that scenario, the solution shown before will not work because the actual fact table has not yet been loaded at the time we load the dimension table. There is no way for us to "load only dimension values for which facts have been loaded." Therefore, the Exists() function cannot be used.
The alternative approach consists of first loading the entire dimension table and then reducing the record set based on the corresponding values in the fact table after the facts have been loaded. Let's see how this works by following these steps:
The code we inserted creates a temporary table, Temp_Aircraft_Type_Dim, which contains all of the distinct %Aircraft Type id values from the Main Data fact table. By using a right keep statement, the data in the original Aircraft Types table is reduced to only those rows that are associated with the Main Data table. After the Aircraft Types table has been truncated, we remove the temporary table.
The Left Keep prefix can also be used, accompanying the Load statement corresponding to the Aircraft Types dimension table, if the Main Data table had been loaded first.
We will now be able to see that when all values from the Year listbox are selected, no aircrafts are being excluded in the Aircraft Type listbox.
A solo exercise
Most of the dimensions we loaded to the Airline Operations app in Chapter Seeing is Believing and Chapter Data Sources present the scenario described previously. That is, the subset ratio for most key fields in the Main Data table is lower than 100%.
The end users of our QlikView document, High Cloud Airlines, have decided that they don't need unused values in the dimension tables as it corresponds to either airlines that are no longer in operation or aircrafts that are no longer in use.
Take what you've learned in this section and reduce all of the dimension tables to contain only those values that appear in the fact table and save the updated document.
The Origin and Destination Airports dimension tables perform a direct query to the source database. Therefore, the Exists() function cannot be used as described here. A QlikView function might not be interpreted as expected in a direct database query. Therefore, we need to use the Left Keep prefix approach in those two cases to achieve the expected result.
Once you've reduced the dimension tables and saved the document, take a look at the size of the QVW file and you'll see the impact of removing unnecessary data.In this case, the document size on disk will be reduced from around 55 MB to approximately 33 MB. This will also have a positive impact on RAM usage.In the next section, we'll work with a side example, so you may now close the Airline Operations.qvw document.
Dealing with facts without dimensions
Of course, when dimensions can exist without related facts, the inverse can also be true. Let's look at how we can deal with facts that do not have any associated dimension values.
As you may have noticed in the Table Viewer window, our current example data model is a bit too tidy. There aren't any dimensionless facts. However, to illustrate the new scenario, we've prepared a side example for which you will find the corresponding data files in the Airline OperationsSide examplesChapter (Data Modeling Best Practices)folder. Make sure you have the Flights.csv and Aircrafts.csv files in the specified folder. Then, follow these steps:
After finishing the script execution, if we open the Table Viewer window (Ctrl + T), we can analyze the subset ratio for the %Aircraft Type id field, seen here:
Notice that the subset ratio is 100% in the Flights table, but below 100 percent in the Aircrafts table. In other words, there are now flights with no corresponding dimension data.
Having facts without an associated dimension is undesirable. When we use the dimension in a dashboard, facts that are not associated all get grouped under a hyphen symbol. Since this is basically a null value, this group of facts can not be easily selected by the user.
To illustrate this, let's create a new bar chart with the Aircraft Group field as dimension, which is an aircraft attribute, and Sum ([# Departures Performed])/1000 as the expression. We will end up having something like the following:
On the other hand, we cannot just remove these dimensionless records from the fact table as it will skew the total amounts.
While the appropriate response is always discussed with and decided by the business users, a very common approach is to add dummy dimension values to the dimension table. To do this in our current example, let's follow these steps:
Here's what the added script does:
When checking the Table Viewer we'll see that the Subset Ratio value for %Aircraft Type id is now 100% on both tables. This can be verified by looking at the previously created chart, which now groups all of the Unknown values, as seen here:
Additionally, when adding a new listbox with one of the various aircraft attributes, we can see that the Unknown values are being listed as well, as shown in the following screenshot:
Save and close the Dimensionless facts.qvw file to continue.
Qlik View Related Interview Questions
|Microstrategy Interview Questions||IBM Cognos Interview Questions|
|PL/SQL Interview Questions||MSBI Interview Questions|
|VBA For Excel Interview Questions||SAP BO Interview Questions|
|SQL Database Interview Questions||Qlik View Interview Questions|
|R Programming language Interview Questions||Pentaho Interview Questions|
|Advanced SAS Interview Questions|
Qlik View Tutorial
Seeing Is Believing
Data Modeling Best Practices
Basic Data Transformation
Set Analysis And Point In Time Reporting
Advanced Data Transformation
More On Visual Design And User Experience
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.