Extracting data—two hands-on examples - Qlik View

In this section, we will go through the steps required to extract data into a QlikView document. The extraction process through which we pull data into the QlikView document consists of:

  1. Connecting to the database
  2. Querying the database
  3. Reloading the QlikView script

We will provide two examples of data extraction using two different data sources:

  • A Microsoft Access database
  • A table file

Extracting data from MS Access

Our first example will demonstrate how to extract data from an MS Access database. It will be a good example since the connection process is very similar to that used when connecting to most major DBMSs. We will be using one of the drivers discussed in the previous section, and covering the steps required in the entire process.

Before continuing, make sure a database file named Dimension Tables.mdb is in the Data FilesMDBs folder. If not, proceed to create the folder, if necessary, and copy the file.

Configuring the driver

Drivers for MS Access databases are often installed, by default, with the Windows OS. The default drivers are built for 32-bit architectures, but that won't be a problem for us since, as pointed out earlier, both editions of QlikView can make use of any 32-bit driver.

Connectivity to an MS Access database is provided by Microsoft either through ODBC or OLE DB drivers. At this point, we must decide which of the two types of drivers we want to use. Since the connection setup via the OLE DB driver is more straightforward, we will opt for that method. However, we will take a moment to briefly describe the configuration process for ODBC drivers.

The following process is not necessary when using OLE DB drivers, and, for the purpose of our example, we can skip these steps.

How to set up an ODBC connection

In order to create the ODBC configuration, we need to go to the ODBC Data Source Administrator window. Access this window via Control Panel | Administrative Tools | Data Sources (ODBC).

Accessing a 32-bit ODBC Data Source Administrator from a 64-bit machine

If you are using a 64-bit machine and need to configure a 32-bit ODBC driver, you will need to access ODBC Data Source Administrator from a different location. Go to the %systemdrive%Windows SysWoW64 folder and launch the Odbcad3 2.exe file. Otherwise, only 64-bit drivers will be available to configure.

Once in the ODBC Data Source Administrator window, we'll go to the System DSNtab so the configuration we set is visible for any user of the machine, and then we'll click on Add. , as shown in the following screenshot:

How to set up an ODBC connection

The Create New Data Source window appears, which is where we will be able to select the driver we want to use, as shown in the following screenshot:

Create New Data Source

After clicking on Finish, the configuration window will appear. Depending on the driver you select, the configuration and parameters required for the connection will vary. However, the configurations for almost all drivers on this, and all of the following dialog windows, are very simple with self-explanatory fields.

After configuration, an ODBC connection can be used from QlikView in the same manner as the OLE DB connection, just by selecting ODBC instead of OLE DB when creating the connection string, which is described in the following section.

Let us now continue with our example using the OLE DB driver. If you followed the process just described, click on Cancel to follow the OLE DB procedure instead.

Creating the OLE DB connection string

The connection string is basically a set of instructions and specifications with which QlikView will establish the communication with the database. It contains the database name or network location, the driver name, as well as the credentials with which we will access the database (username and password), if needed.The connection string is created from QlikView, so the next thing we will do is open the QlikView document Airline Operations.qvwthat we created earlier in Chapter

Seeing is Believing. We will add new tables to the data model, this time extracting them from MS Access, to continue exploring how QlikView's built-in extraction capabilities work.

Once the QlikView document is opened, go over to the Edit Script window (Ctrl + E or File | Edit Script...). We've already worked briefly with this window, and this time, we will make use of the Database section in the Data tab.

Activate the Airports tab and position the cursor on the last line, below the existing Load statement. From the Database section, select OLE DB from the drop-down menu, and click on the Connect. button, shown in the following screenshot:

Creating the OLE DB connection string

The Force 32 Bit checkmark is used to specify that QlikView should look for 32-bit drivers, instead of looking for 64-bit drivers, installed on the computer.It is relevant for both ODBC and OLE DB drivers.

If you have installed the 64-bit edition of QlikView, please mark this option before clicking the Connect... button so QlikView uses the correct connection engine.

Remember, we are using the 32-bit connection only because we don't have a 64-bit driver available to query our database. In all cases in which the 64-bit driver is available, it is advisable to use that instead, since using 32-bit drivers might significantly reduces performance over the 64-bit equivalent.

The Data Link Properties window, which is composed of several tabs, will appear. The first tab (Provider) shows a list of all the available OLE DB drivers. Here, we will select Microsoft Jet 4.0 OLE DB Provider. Once selected, click on Next >>to move to the Connection tab, in which we will specify the database file we want to connect to.

Data Link properties

Click on the browse (...) button, placed between number 1 and number 2, to select the database file we have stored in the Data FilesMDBs folder.

Since the database file does not require logon credentials, we will leave the User name and Password fields blank. The Blank password checkmark should be selected as well.

We will now click on Test Connection to make sure the connection is established. A message will indicate if the test went well and, if so, we may now click on OK. If not, we need to make sure the configuration is correct and verify that the database file is accessible.

Scrambling user credentials

For databases that require logon credentials, the username and password will be stored in the connection string either as plain text or as scrambled text. To store them as scrambled text, select the Scramble User Credentials checkbox, in the Settings tab of the tool pane in the Script Editor window before generating the connection string.

After clicking on OK, you should see the newly generated connection string as part of the script.

It is possible to have several connection strings in the same QlikView document. This allows you to pull data from different sources. Each time that a new connection string is found during the script execution, the previous connection is automatically disconnected. We can also use the Disconnect; statement to explicitly drop the previous database connection before connecting to the next one.

An elegant and administration-friendly approach is to store the connection string in a text file, residing in a folder that is reachable from your QVW files. Import this connection into every QVW via an include statement (from the Edit Script window, select Insert | Include Statement). The benefit of this approach is that if the connection string should change, you only need to modify it in one place, and all of the corresponding QVW files will automatically use the updated Connect statement.

Querying the database

Now that we have established communication with the database via our connection string, we can begin retrieving data from it using the Structured Query Language (SQL). QlikView makes it easy for us to create the Select statements to build our queries.

The Create Select Statement wizard

The Select statement is used to pull data from the database into our QlikView document, and tells the DBMS the specific set of data we want. We could just type it manually, but instead we will use the Create Select Statement dialog in order to find the table we want to read, as well as the fields we need, and automatically populate the required QlikView script.

Since we've already created the Connect statement, we can go ahead and click on the Select button from the Data tab. The Create Select Statement dialog window will appear. This window is used to specify the database, table, and fields we want to load. After we click on OK, the corresponding Select statement will be generated.

Let's look at the components of this particular window, so we know what each option does. The following screenshot shows the Create Select Statement dialog window:

Create Select Statement dialog window

We can split the window into three horizontal panes. In the top pane, we specify where the tables we want to read are located. We have a Database drop-down field and an Owner field of this database. We also have three buttons to the right, in this top pane:

  • Connect.: This button lets you create another connection. It is rarely used since the current window is usually opened after a connection is already created.Driver...: This button provides some information about the driver you are currently using.
  • Support.: This button opens a pop-up window, which may or may not contain data, but is intended to provide information about the database.

In the middle pane, we can choose what will become the core of our Select statement: the table and fields we will read.

On the left side of this middle pane, we have several checkmarks that will allow us to filter the list of tables we see to the right. We can select to see only Tables, Views, both, and so on. Once we apply the appropriate filter, we can move on to pick the table we need. Note that the list is alphabetically ordered, so you can type the first letter of the table you want to find, to automatically scroll to the section where the tables whose names begin with the specified letter are listed, and then scroll further down until you find it.

After highlighting the table, we can move on to the next section of this middle pane, which is the list of fields which the selected table contains. In this section, we have an additional option which is the ability to sort the listed fields by Text Order or by the Original Order on which they are stored in the database.

Note that we will always see a star symbol at the top of the list. We can highlight this "wildcard" character if we want to pull all of the fields contained in the table. In case we want to load only a few, but not all of the fields, we can do that by highlighting each of them individually and not with the star symbol.

We can also click on Show Icon for Key Fields, if we want to identify the fields that are defined as key fields in the database.And finally, we have the bottom pane, which at the same time is divided into several tabs. Let's go through each of them briefly:

  • Script: This tab will give a preview of what the wizard will create based on our selections from the panes above.
    1. To the right, we have additional options to specify how we want the script to be generated (Column, Row, or Structured). We can also add a Preceding Load, which lists the resulting fields individually and makes them available for QlikView-side operations.
    2. We also have an Add. button which basically allows the creation of several Select statements involving several tables at the same time, without needing to click on OK and then return to the Create Select Statement window for each one.
  • Table: This tab is used to view general information about the selected database table.
  • Columns: This tab will provide specific information about the properties of the fields that make up the table.
  • Preview: This tab will show a preview of the table, consisting of the first few rows.
  • Blobs: This tab provides the ability to bundle objects contained in a Binary Large Object (Blob) field into the QlikView application. This feature is only supported when using an ODBC connection (OLE DB is not supported for this).

Adding the airport tables

Even though the Access database contains all of the dimension tables used in the Airline Operations data model and more, we will only extract the tables corresponding to the origin and destination airports and incorporate them into our application.

Using the Create Select Statement wizard described earlier, create the Select statement to extract the Origin Airports table with both the %Origin Airport ID and Origin Airport fields. Make sure to create a script in the form of Column with each field name listed in Preceding Load for us to manipulate it further, if needed.

The following screenshot shows the configuration we need in the Create Select Statement dialog for this particular example:

Create Select Statement dialog for this particular example

The resulting script is as follows:

LOAD '%Origin Airport ID', 'Origin Airport'; SQL SELECT '%Origin Airport ID', 'Origin Airport' FROM 'Origin Airports';

Did you notice how we didn't use the star symbol when selecting the list of fields to retrieve, even when we needed to pull all fields? This is a best practice, to ensure that only the required fields are returned by the query, and no more. Suppose, for example, that a new field is added to the source table. If we used the star symbol to query the database, we would automatically retrieve this new field even when it's not necessary for our data model, wasting valuable bandwidth in the process.

Follow the same process to add the script needed to load the Destination Airports table.

Reloading the script

We now have a query to execute, and need to reload the script to actually pull the data into our QlikView document (and into RAM, for as long as the QlikView document is open). We can either select File | Reload, press Ctrl + R, or click on the Reload button from the toolbar.

After this, as we've seen previously, a Script Execution Progress window, shown below, appears that shows feedback about the loading process. It also tells us, after reading a table, how many rows it fetched, among other things.

Reloading the script

By default, the Close when finished checkmark, at the bottom-left of the window, is enabled. It tells QlikView to close the progress window immediately after finishing the script execution. It is sometimes useful to disable this property, so we can get an overview of the entire process after it is finished. You can either uncheck the option right from this window or change the parameter via Settings | User Preferences | Keep Progress Open after Reload. You will know that the script execution has ended when the Close button becomes enabled. Click on Close to dismiss the dialog.

Since we executed the reload operation from the Edit Script window, the Sheet Properties dialog appears immediately after script execution, with the Fields tab active by default. As we saw in Chapter Seeing is Believing, through this window we can add fields to our workspace in the form of list boxes and start reviewing what we got from the query we ran.

This dialog does not appear when launching the reload from outside the Script Editor. If that is the case, you can access it by right-clicking on a blank space of the sheet area and then clicking on Select Fields....

Click on OK to dismiss the Sheet Properties dialog window.

The resulting data model

If we press Ctrl + T at this moment, Table Viewer will appear and we will be able to see the resulting data model. The data model now consists of the tables we added previously, in Chapter(Seeing is Believing), Seeing is Believing, and the two tables we added from the MS Access database.

The following screenshot shows the resulting data model:

The resulting data model

Table Viewer is a great tool to analyze the data model and check table associations. In the next chapter, we will describe in more detail how we can take advantage of it.

After reviewing Table Viewer, click on OK to dismiss it, and make sure to save the changes we made to the application before moving on to the next section.

Loading a table file

We have now covered data extraction from a typical database via ODBC and OLE DB. QlikView is also able to load data from a table file, such as an Excel, CSV, TXT, and XML file, among others.

Let's go through an example of loading a CSV file, so we can describe the steps and dialog windows involved. We will load two additional tables in the CSV format with the purpose of demonstrating that, no matter what the source is, at the end (that is, once added to the QlikView document and data model) all tables are equal. The new tables are contained in the Origin Markets.csv and Destination Markets.csv files, so make sure you have them in Data FilesCSVs.

We will add this table to the same data model used in the previous section, so if you already closed the Airline Operations.qvwdocument, please open it again. Go to the Edit Script window, activate the Airports tab, and position the cursor on the very last line. Next, click on the Table Files. button from the Data tab in the tool pane below. The Open Local Files dialog will appear, in which we must select the file we want to load. In our case, we will first browse and select the file named Origin Markets.csv and then click on Open.

A new window pops up. It will be the equivalent of the Create Select Statement dialog we previously discussed. Here we will define some configuration options about how the Load statement will be created.

This configuration is separated into several steps in the dialog window. We will go through the most important sections of this wizard.

Specifying the file attributes

The first step is defining the attributes of the file we are reading, as well as the fields we want to include. In the following screenshot, we can see the contents of the File Wizard: Type section:

Specifying the file attributes

We can divide this window into three panes; the first (left pane) contains radio buttons for us to select the corresponding File Type. The second pane, to the upper right of the window, contains various settings about the file attributes. These settings will vary depending on the file type. Finally, we have a preview pane, at the bottom, that reflects how the table file will be interpreted by QlikView with the configuration we are setting on the two other panes. The preview pane provides immediate feedback after we modify any of the configuration options in the rest of the window.

The File Wizard tries to determine the File Type automatically, and it is pretty accurate. However, you can easily change it in case it got it wrong, by selecting the corresponding radio button. As you can see, the file types that can be loaded using this wizard include: Delimited (CSV, TSV, and so on), Fixed Record (when the file does not contain a specific character as a separator, but is consistent with its column widths), Excel files, Html, Qvd(QlikView datafiles, like the ones we used in Chapter

Seeing is Believing),Xml, and Qvx(QlikView eXchange format). As part of this example, we will further discuss the settings involved when reading a CSV (delimited) file. The configuration for the rest of the file types is very similar to this, so we will not go into detail for each of them.

The CSV attributes

Since our CSV file is a delimited file, it is possible to select which character is used as a separator. This is done via the Delimiter drop-down field. The Quoting schema to be used can also be specified, with the available options of Standard, MSQ, and None. By default (that is, using MSQ), straight double quotes ("") and straight single quotes (' ') are permitted for field values, but with one condition: they must be in both the first and the last non-blank character of a field value.

In case the table file has more than one line as the header record, you can specify it in this same window, with the option to set it as a number of lines or number of bytes.

QlikView can also recognize comment lines in the file, when we specify the character(s) which identifies a comment line by typing it in the Comment field.

The other options we can set are: whether the file has the field names defined in the header row (Embedded) or not, and whether QlikView should ignore an End of File mark (Ignore EOF).

A word on Quoting

If a cell contained in the table being read has only one quotation mark, or if the quotation marks are not in the first and last non-blank character position, the script reload will not be correctly executed and the file will not be properly read.

The following diagram shows three different scenarios that can be encountered when reading text files with quotes. The first two will either result in an incomplete read, or the table will contain "dirty" data. The third scenario will be read correctly. Pay special attention to the value of the third record in Field B of the input table for each scenario.

A word on Quoting

We need to emphasize this point because, for example, in the first scenario stated previously, QlikView will not alert about a possible misinterpretation of the input table. Instead, it will simply mark the dirty record as the end of the file and finish the extraction, with all of the subsequent records being left out. This potential issue is not always apparent in the preview pane of the File Wizard, as the offending character may be further down in the table. Look out for the listboxes containing mixed types of content, as this might be a sign that the described issue is present.

The solution to the first and second scenarios is to change the Quoting schema to Standard or None from the File Wizard: Type window.

Previewing

The preview pane, as stated previously, will show how the file will be read by QlikView with the configuration we have defined. We can also use this pane to rename fields and/or exclude columns from the extraction.

To change a field name from the preview pane, simply click on the cell containing the field name (dark gray, with an X mark on its right side), and type the new field name. If the file we are currently loading does not contain field labels, QlikView defaults them to something like @1, @2, and so on, depending on the column number.

After we've set these configurations, we are done with this window. Let's click on Next to continue with the following step in File Wizard.

The transformation step

The second step in the File Wizard is the Transform process. Since, for now, we are not going to make any transformation to our file, we are going to skip this step. However, it is important that you know how and when this can be useful for you. Chapter Basic Data Transformation, covers this topic more in depth.

Let's just say, for now, that the transformation step is used when loading files that are not in a format consistent with a traditional table (that is, pure and clean rows and columns). The Transform dialog is shown here:

The transformation step

As you can see, there is a warning about using this feature: it requires a large amount of RAM. If your source file is large, you might not find this Transform wizard useful, and other techniques will need to be employed to process the source data.

Click on Next once more to skip the transformation step, and move to the third step in the File Wizard.

Refining the input table

There are several options that can be defined to treat a table file and transform/ convert it in the loading process. This is done in the File Wizard: Options step depicted in the following screenshot:

Refining the input table

We can set options such as Where..., specify QlikView to treat the table as a Crosstable..., or interpret it as a Hierarchy.... However, we will not use any of these options when loading the Origin Market table, but we will describe their components and functions for introductory purposes.

Where Clause wizard

A Where Clause is used when we need to exclude records from the input table. We can specify on which condition these records should be left out. When clicking on the Where. button from the File Wizard: Options window, the following dialog will appear:

Where Clause wizard

The Simple configuration allows us to set the commonly used conditions of Field - Operator - Value. For example, Field A = X will exclude all records from the input table that have the value X in Field A. The operators available in this wizard are shown in the screenshot below:

Data-Sources-Figure

The Advanced option allows us to write the desired Where Clause by hand, whereas the Empty Template will only add a Where Clause similar to the following:

Where (1 = 1)

This will not exclude any records, but you can manipulate it after the script is created.

The crosstable

QlikView is also able to convert crosstables (a table where there is a column for each dimension in a range) to traditional tables. For the file we are loading in this example, we won't need this function, but it's important that you know about it since this table structure is very common, particularly in budget spreadsheets. An example of a crosstable is shown here:

The crosstable

These tables, because of their structure, are not appropriate for a QlikView data model. We will discuss this topic further in Chapter Data Modeling ,Best Practices, along with other transformation options and hierarchy tables.

The resulting script

For now, let's continue to the next step in the File Wizard. Click on Next from the File Wizard: Options window, and the File Type: Script dialog will appear.

This window basically lets us take a look at the generated script with the configurations set in the previous steps. Additionally, we can set a Max Line Length parameter to make the script easier to read once it is pasted into the Script Editor, or enable the Load All (*) option, which will generate a script that in distinctively loads every field in the source table.

If you have previously changed field labels, or excluded some fields from the load (using the File Wizard: Type window), you should not use the Load All (*) option, since it will override those previous settings.

The File Wizard: Script dialog window is shown in the following screenshot:

File Wizard: Script dialog window

Let's leave these options as they are for now, and click on Finish. The script will be generated and added to the Edit Script window.

Please note that when reading data from a plain table file, we do not use a Select statement but only a Load statement. That is because the SQL Select statement is only used to send the appropriate command to the ODBC, OLE DB, or some other data connectivity drivers, and query the database. Since we are only loading a local file, no driver is being used, and the built-in QlikView extraction functionality is used to pull data into our document.
We can now reload the script, and the table will be pulled into QlikView and treated like any other table from any other source, from hereafter.

Save before reloading

The reload operation will execute every script statement, and there will be times when the script execution will fail for a number of reasons. Therefore, it's a good practice to hit Save before actually reloading the script. In some cases, when the reload fails, changes since the last save are lost.

We have now loaded the Origin Market table. Take a moment to do the same with the Destination Markets table and include it in our data model. Afterwards, save and close the QlikView document.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Qlik View Topics