Reading Flat Files - OLAP

You can use file reader OLAP DML commands to acquire data from external files in various formats: binary, packed decimal, or text. While you can use some of the file reader commands individually, it is best to place them in a program. You can thereby minimize mistakes in typing and test your commands on smaller sets of data. A program also enables you to perform operations in which several commands are used together to loop over many records in a file. Afterward, you can use the program to refresh your data.

About the File Reader Programs

Table describes the OLAP DML file reader commands.

OLAP DML File Reader Commands

OLAP DML File Reader Commands

OLAP DML File Reader Commands

Writing a Program for Reading Files

While reading from a file, you can format the data from each field individually, and use DML functions to process the information before assigning it to a workspace object. Reading a file generally involves the following steps.

  1. Open the data file.
  2. Read data from the file one record or line at a time.
  3. Process the data and assign it to one or more workspace objects.
  4. Close the file.

The FILEREAD and FILEVIEW commands have the same attributes and can do the same processing on your data. However, they differ in important ways:

  • The FILEREAD command loops automatically over all records in the file and processes them automatically. Use FILEREAD when all records that you wish to read in the file are the same. FILEREAD is easier to use and faster than FILEVIEW.
    Because FILEREAD is able to process most files, it is shown in the examples in this chapter.
  • The FILEVIEW command processes one record at a time. FILEVIEW is the more powerful of the two file-reading commands; it can process all of files that FILEREAD can, plus process different types of records.

Example provides a template for a developing a file-reading program in the OLAP DML.

Template for Reading Flat Files

VARIABLE funit INTEGER "Define local variable for file handle TRAP ON CLEANUP "Divert processing on error to CLEANUP label funit = FILEOPEN('directory/datafile' READ) "Open the file"Read the file with FILEREAD FILEREAD funit . . . CLEANUP: "Cleanup label IF funit NE na "Close the file THEN FILECLOSE funit

Mapping Fields to Workspace Objects

The FILEREAD command maps fields to workspace objects. A source file can be structured with records in any of the following ways:

  • Ruled files contain data in columns, with fields defined by a starting position and a width.
  • Structured PRN files contain strings of text or numbers. A text field is enclosed in quotation marks. A number field can contain periods (.) in addition to numbers, but any other character, including spaces and commas, terminates the field.
  • CSV files (for Comma-Separated Values) use a special character, the delimiter, to separate the fields in a record.

The target for the data in an analytic workspace is either a dimension, a relation, or a variable. Dimensions can either be maintained by adding new members, or they can be used just to align incoming data with existing dimension members. In standard form analytic workspaces, a variable is typically an attribute or a measure.

Reading Ruled Files

The basic syntax of FILEREAD for mapping the data in ruled files is:

COLUMN n WIDTH n workspace_object

The following is an example of four records from a data file. From left to right, the columns are channels, products, customers, time periods, and units. The first column (channels) is 10 characters wide, and the other columns are 11 characters wide.

2 13 51 54 2 2 13 51 56 2 2 13 51 57 2 2 13 51 58 2

The following FILEREAD command reads the data from the last column into the UNITS_VARIABLE variable, aligning the values in the other four columns with existing dimension members. The RULED keyword is optional, since it is the default record format.

FILEREAD funit RULED - COLUMN 1 WIDTH 10 channel - COLUMN 11 WIDTH 11 product - COLUMN 22 WIDTH 11 customer - COLUMN 33 WIDTH 11 time - COLUMN 44 WIDTH 11 units_variable

Reading Structured PRN Files

The basic syntax in FILEREAD for mapping structured data is:

FIELD n workspace_object

The same data file shown previously in "Reading Ruled Files" can be read with the following command:

FILEREAD funit STRUCTURED - FIELD 1 channel - FIELD 2 product - FIELD 3 customer - FIELD 4 time - FIELD 5 units_variable

Reading CSV Files

The basic syntax for reading a CSV file is the same as for structured PRN files:

FIELD n workspace_object

The following is an example of four records from a CSV file, in which a comma is the delimiter. The fields are the same as the previous data file shown in "Reading Ruled Files": channels, products, customers, time periods, and units.

2,13,51,54,2 2,13,51,56,2 2,13,51,57,2 2,13,51,58,2

This file can be read with the following command; the DELIMITER clause is optional in this case, because a comma is the default delimiter.

FILEREAD funit CSV DELIMITER ',' - FIELD 1 channel - FIELD 2 product - FIELD 3 customer - FIELD 4 time - FIELD 5 units_variable

Setting Dimension Status for Reading Measures

Whenever you read data values into a variable, you must set the status of each dimension. Typically, the incoming records contain a field for each dimension; when a record is processed in the analytic workspace, the dimensions are temporarily limited to these values so that data targeted at a variable or relation is stored in the correct cell. However, if the records omit one or more dimensions, then you must set them manually before reading the file.

For example, if your file contains data only for the Direct Sales channel for August 2003, and thus does not have fields specifying the channel or time, then your program must limit the CHANNEL and TIME dimensions before reading the file. Otherwise, the data is aligned with the first member of those dimensions (All Channels and Jan-98).

Optimizing a Data Load

Your data will load fastest if the variables in your analytic workspace are defined with fastest and slowest varying dimensions that match the order of records in the source data file. If you have control over the order of records in the source data file, then you can create the data file to match the variables in your analytic workspace. Otherwise, you may need to choose between optimizing for loads and optimizing for queries when defining the dimension order of variables in your analytic workspace.
For example, a data file might have records sorted in this order:

  • Lists all records for the first channel, then all records for the second channel, and so forth.
  • Lists all products for the first channel, then all products for the second channel, and so forth.
  • Lists all customers for the first product, then lists all customers for the second product, and so forth.
  • Lists all time periods for the first customer, then all time periods for the second customer, and so forth.

In a workspace variable definition, the order of the dimensions identifies the way the data is stored. The fastest-varying dimension is listed first, and the slowest-varying dimension is listed last.

For this sample file, the data load will proceed fastest if the target variable is defined with TIME as the fastest varying dimension and CHANNEL as the slowest varying dimension, so the dimensions are listed in this order: TIME PRODUCT CUSTOMER CHANNEL. With a composite dimension, the definition looks like this:

DEFINE UNITS_VARIABLE VARIABLE DECIMAL <TIME UNITS_CUBE_COMPOSITE<CUSTOMER PRODUCT CHANNEL>>

Having the TIME dimension as the fastest varying dimension outside the composite also provides good run-time performance for time-based analysis, because the time periods are clustered together. This is a best-case scenario, in which the workspace variables are optimized for queries, and the data file is sorted correctly for the fastest loads.

However, if you have a separate data file for each time period, then TIME becomes the slowest-varying dimension for the load. In this case, there is a conflict between the dimension order that optimizes queries, and the dimension order that optimizes data loads. You need to choose which dimension order is best under these circumstances.

If you have a small batch window in which to load data, you may need to optimize for the data load by defining variables with TIME as the last dimension, as shown here:

DEFINE UNITS_VARIABLE VARIABLE DECIMAL <UNITS_CUBE_COMPOSITE <CUSTOMER PRODUCT CHANNEL> TIME>

Reading and Maintaining Dimension Members

The records in a data file typically contain fields for dimension values that identify the cell in which the data values should be stored. When all of the dimension values in the file already exist in your analytic workspace, you can use the default attribute MATCH in the dimension field description. MATCH accepts only dimension values that already are in the analytic workspace.

When an incoming value does not match, the command signals an error. Your file reader program can handle the error by skipping the record and continuing processing, or by halting the processing and letting you check the validity of the data file. The test for whether the error is caused by a new dimension member or another type of error is based on the transient value of the ERRORNAME option when the error is signaled.

Example provides a template for error handling that permits processing to continue when a dimension value in the data file does not match a dimension value in the analytic workspace.

When your data file contains all new, or a mixture of new and existing dimension values, you can add the new values and all the associated data to the analytic workspace by using the APPEND attribute in the field description, as shown here:

FILEREAD funit - COLUMN n APPEND WIDTH n dimension

Template for Skipping Records With New Dimension Members

VARIABLE funit INTEGER "Define local variable for file handle TRAP ON oops "Divert processing on error to oops label funit = FILEOPEN('directory/datafile' READ)"Open the file next: "Resume processing label FILEREAD funit "Read the file with FILEREAD . . . WHILE FILENEXT(funit) "Or read it with FILEVIEW DO FILEVIEW funit... DOEND FILECLOSE funit "Close the file RETURN "End normal processing oops: "Error label IF funit NE na AND ERRORNAME NE 'ATTN' THEN DO TRAP ON oops GOTO next "Resume processing at next label DOEND IF funit NE na "Close the file on error THEN FILECLOSE funit

Transforming Incoming Values

The FILEREAD command enables you to modify values as they are read into the analytic workspace.

Basic Transformations

To simply add characters before or after a value, use the LSET and RSET clauses. For example, if the incoming time periods are only the months (such as JAN, FEB, MAR), you can add the year before storing the values in the TIME dimension:

FILEREAD funit - COLUMN 1 WIDTH 15 RSET '-04' time

For other transformations, you can use the FILEVIEW command or any of the data manipulation functions of the OLAP DML. The object of these manipulations is the keyword VALUE, which represents the value of the current field. In this example, the incoming values are converted to upper case:

FILEREAD funit - COLUMN 1 WIDTH 15 time = UPCASE(VALUE)

Using Relations to Align Dimension Values

If you need to match existing dimension values and a simple transformation cannot create a match, then you can create a relation in the analytic workspace that correlates the two sets of values. Take these steps:

  1. Create a new dimension for the incoming dimension values.
    You can define the dimension in Analytic Workspace Manager or with a command like this in OLAP Worksheet:
    DEFINE new_dimension DIMENSION TEXT
  2. Read the dimension values from the file.
  3. Create a relation between the two dimensions.
    You can define the relation in Analytic Workspace Manager or with a command like this in OLAP Worksheet:
    DEFINE relation RELATION dimension <new_dimension>
  4. Read the data from the file, using the relation to align the data.
    Use syntax like this:
    FILEREAD funit CSV - FIELD 1 dimension = relation(new_dimension VALUE)

For example, if your Product dimension uses SKUs (stock keeping units) as dimension members, and you want to add data that uses bar codes, then you create a dimension for the bar codes and a relation between the bar codes and the SKUs of the Product dimension. You can populate the relation from a file that correlates bar codes and SKUs.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

OLAP Topics