Control Flow and Data Flow Designer Tasks - SQL Server 2008

As mentioned earlier, an SSIS package is defined by tasks, connections, containers, and precedence constraints. A package can contain one or more tasks (also called executables).

SSIS has added several new tasks and enhanced others, and we will briefly review some of these tasks in this section. Different SSIS tasks will appear in the Toolbox window depending on the current design surface.

Control Flow Containers and Tasks

Figure shows the Toolbox when you’re working in the Control Flow design surface. Let’s look at the available containers and tasks.

Toolbox for the Control Flow design surface

Toolbox for the Control Flow design surface

Containers

Containers provide a way to logically group tasks together for readability or dependency purposes. SSIS provides three Control Flow containers: the For Loop container, the Foreach Loop container, and the Sequence container.

Sequence Container

The Sequence container can contain one or more tasks, or other containers, allowing you to logically group tasks together. Because these tasks are contained within a Sequence container, they can be managed as a unit. This task provides visual benefits as well—by selecting the upper-right corner of a container, you can expand or collapse the detail of the tasks within, further enhancing readability of the package, as shown in Figures.

An expanded Sequence container

An expanded Sequence container

A collapsed Sequence container

A collapsed Sequence container

Using a Sequence container also allows you to do the following:

  • Enable or disable all tasks within the container (e.g., if you are debugging the package and do not wish to fire off the tasks within this container).
  • Isolate workflow by container (e.g., have a container that is executed upon success of the previous task, or a different container that is executed on failure of the previous task).
  • Define scope for a transaction, meaning that the rollback of a transaction can apply to all tasks within the Sequence container.
  • Connect to other containers or other tasks by precedence constraints. Tasks within a container can also be connected to each other in the same way.

For Loop and Foreach Loop Containers

Two containers allow for repeating control flow within a package:

  • For Loop: Like the Sequence container, the For Loop container allows you to place one or more tasks within it. Unlike the Sequence container, the For Loop container executes the tasks within it repeatedly, based on an evaluation condition. Once the condition returns false, execution of the tasks within the container ceases.
  • Foreach Loop: This container executes the tasks based on an iteration statement. The iteration statement can be based on rows in an external table, files in a folder, objects within a variable, Server Management Objects (SMO), or even an XPath expression.

Let’s look at an example that uses the Foreach Loop container. Suppose that we wish to have a nightly process that evaluates error logs generated by SQL Server (under the C:Program Files Microsoft SQL ServerMSSQLLOG directory). In our hypothetical scenario, an external application will evaluate each error file in that directory. After the external process is finished, it will delete the row from the table. Our process is responsible for letting the application know which files need to be processed. In this exercise, we will explore this directory (in real time), finding each file within it and adding the filenames to a table of files pending processing.

Caution

  1. Create the PendingFileProcess table in a test database:
  2. Create a new SSIS project called Foreach loop example.
  3. In the new SSIS package, right-click the Control Flow design surface and select Variables. The Variables window will appear. (We will review variables in more detail later in this chapter, in the “Variables” section.)
  4. Click the Add Variable button (the one showing a function symbol with an orange star in the upper-left corner) (see Figure).
  5. Adding a variable

    Adding a variable

  6. A new package variable appears. Change the Name field text to FileName and the Data Type field text to String.
  7. From the Toolbox, drag a Foreach Loop container onto the Control Flow design surface.
  8. Double-click the Foreach Loop container.
  9. In the Foreach Loop Editor dialog box, change the Name field to Error Files.
  10. In the left navigation pane, select Collection.
  11. Change the Enumerator value to Foreach File Enumerator. This means your task will loop through each file contained within a specified folder. Change the Folder field to the directory where your SQL Server error logs (or, for the purposes of this example, any directory where readable text file logs) are located.
  12. Foreach Loop Editor showing collection properties

    Foreach Loop Editor showing collection properties

  13. In the left navigation pane, select Variable Mappings.
  14. In the Variable section, from the drop-down list, select the variable User::FileName, which you created earlier. This variable will contain each filename found in the specified folder. The Foreach loop will loop through each file, changing the variable value at each pass.
  15. Click OK to exit the Foreach Loop Editor.
  16. In the Connection Managers area, create a new connection that points to the test database where you created the PendingFileProcess table in step 1.
  17. Drag an Execute SQL task into the Foreach Loop container named Error Files, as shown in Figure.
  18. Foreach Loop container

    Foreach Loop container

  19. Double-click the Execute SQL task.
  20. In the Execute SQL Task Editor dialog box, change the Name field to Insert Error Files. In the Connection drop-down list, select the connection you created in step 14. Set the SQLStatement property to the following statement:
  21. The question mark is a placeholder for a parameter mapping. In this case, you will map the FileName variable to this parameter.
  22. In the left navigation pane, select Parameter Mapping and click the Add button.
  23. Select the User::FileName variable from the drop-down list. Change the Data Type value to nvarchar (matching the FileNM column in the PendingFileProcess table). Change the ParameterName value to @P1.
  24. Click OK.
  25. Select Debug ➤Start Debugging. This will execute the SSIS package, looping through each file in the error log directory. For each file, a new row will be inserted into the PendingFileProcess table. Check this table in Management Studio to confirm that the table was populated:
    SELECT FileID, FileNM FROM dbo.PendingFileProcess
  26. Select File ➤Save All.

You will use this package in the upcoming Scripting task exercise.

Analysis Services Tasks

If you use Analysis Services, you’ll want to be aware of three tasks:

  • Analysis Services Execute DDL: This task allows you to submit Data Definition Language (DDL) statements to the selected Analysis Services connection.
  • Analysis Services Processing: This task processes Analysis Services objects (such as cubes and dimensions).
  • Data Mining Query: This task allows you to submit prediction queries based on Analysis Services data-mining models.

Data Flow Task

A Data Flow task represents an instance of the SSIS data flow engine executable. This executable is responsible for calling the data sources, data destinations, and any transformations that have been added to the Data Flow task.

As you saw in our first example in this chapter, when you add the Data Flow task to the Control Flow design surface and double-click it, you are directed to the Data Flow design surface. A Data Flow task can contain one or more Data Flow tasks (which we’ll review in the “Data Flow Designer Tasks” section later in this chapter). You can also have more than one Data Flow task on the Control Flow design surface.

As noted earlier, the Data Flow tasks describe how data moves from a data source (source adapter), and how the data is modified and mapped (transformation) to a destination (destination adapter). In other words, you take data from somewhere, change it in some way, and then store the results somewhere else.

This mapping of source to transformation to destination is called a graph. A graph is a set of connected Data Flow components. One graph represents, for example, the data population of one table from another table. Unlike DTS 2000’s Transform task, a single Data Flow task can also update multiple tables (which is to say, multiple graphs).

Data Profiling Task

As SQL Server is used more and more for the largest enterprise data warehousing projects in the world, it becomes increasingly important for the core product to provide the majority of dataquality functionality required for these implementations. The new Data Profiling task helps to deliver this functionality. It provides support for eight different profiles: five that profile at the column level and three that provide profiling between columns.

The Data Profiling task provides the following data profiles for individual columns:

  • Column Length Distribution: Provides the distinct lengths for string values with the table, along with the percentage distribution for each length. This profile can help identify datalength problems that are common when migrating data from legacy systems that may not have enforced column lengths properly. Alternatively, this profile could be used to determine the required column lengths for a new table that will hold data from many different sources.
  • Column Null Ratio: Provides the percentage of null values within a column.
  • Column Pattern: Generates regular expressions for a percentage of rows within a table. This can be used to identify the number of different formats within a column for things like phone number, postal codes, and Social Security numbers. For example, if profiling a ZIP code column, you may be interested in how many entries have the expanded ZIP code format (with four extra digits). The likely regular expressions might be d{5}-d{4}, d{5}, and d{9}. Getting an expression like d{15} would cause you to want to do further research.
  • Column Statistics: Provides statistical values for the selected column, such as standard deviation, minimum, maximum, or average. For example, this could be used to do bounds checks for test scores, where no score should be less than 0 percent or greater than 100 percent. You could also use this profile to check date ranges during a data warehouse load. For example, you probably wouldn’t expect an active employee to have a start date greater than the current date.
  • Column Value Distribution: Provides all the distinct values within the column along with their percentage distribution. If you were profiling a column that contained US states, then you would expect 50 or 54 distinct state values, depending on what is considered a state. If you received 70 distinct state values, then obviously there would be a data-quality problem. The Data Profiling task provides the following data profiles for multiple columns:
  • Candidate Key: Indicates whether a particular column, or set of columns, can be used as a key or an approximate key.
  • Functional Dependency: Provides an indication of the dependency of one column on another. For example, if there is a product category and product subcategory relationship, this profile can determine whether a particular subcategory exists only within the product category. It wouldn’t be expected that the subcategory exists within many product categories.
  • Value Inclusion: Indicates the overlap between a set of columns. Continuing with the previous example, when profiling the product subcategory column within the product table, all values should exist within the product subcategory table.

The Data Profiling task generates an XML file that contains the results. This file can then be further parsed and used within a package, or the stand-alone Data Profile Viewer, available within the SQL Server program group, can be used to view the profile output externally from the package.

Execute Package Tasks

SSIS includes two separate tasks for executing SSIS packages: the Execute Package task and the Execute DTS 2000 Package task. Unfortunately, the upgrade path for packages created in SQL Server 2000 is less than perfect. Although best efforts to upgrade are attempted by the upgrade wizard, certain features have changed too significantly in SSIS to provide a smooth upgrade path from prior versions. This is not as bad as it may seem—many very complicated DTS packages are now far simpler to create within SSIS. However, to address compatibility issues, SSIS includes the Execute DTS 2000 Package task.

The Execute DTS 2000 Package task will allow the execution of a DTS 2000 package from within an SSIS package. This way, you can continue to use your older packages until you have an opportunity to rewrite them in the new version. The Execute Package task is used to execute other SSIS packages as part of the workflow. Both tasks allow you to encapsulate and reuse existing packages, rather than creating task or data flow redundancy within your package.

Bulk Insert Task

The Bulk Insert task imports text files into SQL Server tables. Although you cannot validate, scrub, or transform data using this task, it is usually the fastest method for importing data from a text file into SQL Server. Consider using the Bulk Insert task when data file import (to SQL Server) performance is important. We will demonstrate this task in the next exercise.

Execute SQL Task

The Execute SQL task allows you to run SQL statements or call stored procedures against a SQL-compliant connection. For example, you can use this task to create a table, and then populate it with data based on a query.

In the following example, we’ll use both the Execute SQL and Bulk Insert tasks. We’ll demonstrate deleting data from a staging table and populating data from a text file into a SQL Server table using the Bulk Insert task.

The following exercise uses the BookSeller database and BookSales.txt file from earlier examples in this chapter.

  1. Create a new staging table (in Management Studio) in the BookSeller database:
  2. In BIDS, create a new project called Import Book Sales.
  3. In the new SSIS package, create two new connections: one for the BookSeller database and the other for the BookSales.txt file.
  4. Drag an Execute SQL task onto the Control Flow design surface. Double-click the Execute SQL task.
  5. Under Connection, select the connection for the BookSeller database. For the SQLStatement property, type the following:
    DELETE dbo.Staging_BookSale
  6. Click OK.
  7. Drag a Bulk Insert task onto the Control Flow design surface. Double-click the Bulk Insert task.
  8. Change the Name property to Import BookSales file.
  9. In the left navigation pane, select Connection.
  10. Under Connection, select the BookSeller database connection. For the Destination table, select the dbo.Staging_BookSale table. Change the column delimiter to a comma-based delimiter. For the File property, select the BookSales.txt connection. Click OK.
  11. Click the Execute SQL task and drag the green arrow to the Import BookSales file task, as shown in Figure. The green arrow means that the first task (deleting data from the dbo.Staging_BookSale table) must execute successfully before the BookSales.txt data is imported.
  12. Execute SQL and Import BookSales tasks

    Execute SQL and Import BookSales tasks

  13. Select Debug ➤Start Debugging.
  14. Select Debug ➤Stop Debugging.
  15. From Management Studio, within a new query window, execute the following query to confirm that the rows were loaded:

Execute Process Task

The Execute Process task allows you to execute aWin32 executable or a batch file within an SSIS package. For example, you could use this task to call a third-party application that performs business functions unrelated to SSIS task functionality. Be careful not to call executables that require user feedback; otherwise, your package execution may hang.

File System Task

The File System task allows you to perform several different file-based operations, all without needing to create custom code (for example, in DTS 2000, if you wanted to rename a file you would be required to either use xp_cmdshell or write an ActiveX script). The File System task allows you to copy a folder from one location to another, copy a file from one location to another, create or delete a folder in a specified location, delete all files and folders in a folder, delete a file in a specified location, move a file or directory from one location to another, rename a file in a specified location, and change file or directory attributes.

FTP Task

The FTP task allows you to both send and receive files using FTP. With the FTP task, you can send files, receive files, create or remove a local directory, create or remove a remote directory, delete local files, and delete remote files—all without writing a bit of code.

Maintenance Plan Tasks

SSIS includes a set of tasks designed to assist in the maintenance of your SQL Server database environment. These tasks are found under the Maintenance Plan Tasks group in the Toolbox. as shownin Figure. They provide a flexible and enhanced alternative to manually creating plans within SQL Server Agent.

Maintenance Plan tasks in the Toolbox

Maintenance Plan tasks in the Toolbox

Maintenance Plan tasks allow you to back up databases, check database integrity, defragment (using ALTER INDEX, REORGANIZE) or rebuild indexes (using ALTER INDEX, REBUILD), execute SQL Server Agent jobs and T-SQL statements, clean up database history (backup and restore history, SQL Server Agent Job history, and database maintenance plan history), notify operators, shrink databases, and update table statistics.

Message Queue Task

The Message Queue task is used to send or receive messages between SSIS packages or custom application queues using Microsoft Message Queuing (MSMQ) functionality. These messages allow asynchronous communication between systems and applications, meaning that messages can be delivered to a queue, where they will await pickup and processing from the MSMQ service on the server or workstation. Once the message is received, the receiving SSIS package can process and act upon the received message. Messages can be sent in text, file, or variable format.

Send Mail Task

The Send Mail task relies on a valid SMTP server to send e-mail messages. If you have a valid SMTP server on your network, you will be able to use this task to send e-mail notifications on package failures
or successes, specific task events, or reporting of SQL Server data (for example, reporting on the status of database file sizes). This task lets you define an e-mail’s To line, From line, CC line, BCC line, Subject line, and priority level, as well as include file attachments. The e-mail body (called the message source) can be directly entered text, or it can be based on the value of a variable or an external file.

Scripting Tasks

SSIS includes two separate scripting tasks: the ActiveX Script task and the Script task. The ActiveX Script task is a carryover from DTS 2000. It allows you to write scripts using the VBScript or Jscript language to define logic within packages, write functions, access data using ActiveX Data Objects, perform computations, and set variables.

The ActiveX Script component will be removed from the next version of SQL Server, so you should remove it from your packages as soon as possible. The Script task should be used in its place. Unlike the ActiveX Script task, the Script task allows you to write your scripts with the full-featured Visual Basic .NET or C# programming languages. Scripts are written in the VSTA environment, which includes IntelliSense, debugging, Object Explorer, and integrated help modules. Using the .NET Framework and COM objects offers obvious performance and functionality benefits.

In the following example, you will use the Script task to combine multiple files in a directory into a single file (so that it can be imported by the hypothetical application). You will reuse part of the project and package you created for the earlier example demonstrating the use of the Foreach Loop container. As you recall, this package looped through each error file in the C:Program Files Microsoft SQL ServerMSSQLLOG directory. For each file it found, a row was inserted into the PendingFileProcess table.

With the Script task, you can now perform actions that are not natively available within the SSIS environment. In this example, you have been asked to loop through each error file and merge them into a new, single, separate file. This way, the external (hypothetical) application can instead load all errors from a single file.

  1. In the SSIS package from the earlier exercise, delete the Execute SQL task contained within the Foreach Loop container.
  2. Drag a Script task into the Foreach Loop container.
  3. Double-click the Script task.
  4. Change the Name field to Merge Files, as shown in Figure.
  5. Script Task Editor dialog box

    Script Task Editor dialog box

  6. In the left navigation pane, select Script and change the ScriptLanguage property to Microsoft Visual Basic 2008.
  7. Click the Design Script button.
  8. In the Imports area of the script, add the following line:
  9. Imports System.IO

    This will allow you to access file-streaming functionality.

    Within the Public Sub Main() procedure, add the following code:

    This code is commented so you can better understand what it does. In a nutshell, the script grabs the value of the FileName variable (which will be populated by the Foreach Loop container task). This is the file that will be combined into the single C:ApressMergedErrorLog.txt file.

  10. Select File ➤Save to save the script.
  11. Select File ➤Close and Return.
  12. Click OK to exit the Script task.
  13. Select Debug ➤Start Debugging to test the package.

If the package succeeded, both the Foreach Loop container and the Script task will turn green, as shown in Figure. You should also find a populated C:ApressMergedErrorLog.txt file containing the contents of all error logs from the error log directory.

Foreach Loop container and script example after execution

Foreach Loop container and script example after execution

Web Service Task

The Web Service task executes a web method using an HTTP connection. Within the task, you designate
the HTTP connection and the Web Services Description Language (WSDL) file (which specifies the location of the service and the functionality of the web service).

Web Service task input parameters can be specified through strings within the task or, preferably, by using variables. Similarily, the results from aWeb Service task can be written to a file or to a variable.

WMI Tasks

WMI is the programmatic interface provided by Microsoft to allow users to monitor and control Windows system resources. SSIS includes two tasks that tie into the WMI interface: the WMI Data Reader task and the WMI Event Watcher task.

The WMI Data Reader task allows you to execute a WMI Query Language (WQL) query against a specific system (for example, retrieving the application event log file). The WMI Event Watcher task allows you to execute aWQL query that watches for a specific event to happen (for example, being notified when logical disk space falls beneath a certain threshold). The WMI Event Watcher task can also be used to monitor for the existence of a file or files within a specific directory. This is a common requirement to make nightly batch-loading a little more flexible.

XML Task

The XML task allows you to perform operations against XML documents. Some of the operations include the ability to compare XML documents (writing the differences to an XML diffgram document), merge two XML documents into one document, query and evaluate an XML document using the XPath language, and transform an XML document based on an XSLT style sheet.

Data Flow Designer Tasks

Before using the Data Flow design surface, you must first add a Data Flow task to the Control Flow design surface. Once you have added the Data Flow task, you can begin adding items (from the Toolbox) to the Data Flow design surface.

Data Flow components fall into three categories:

  • Source adapters: These access data from different types of data sources, making the data available to the Data Flow task. Source adapters produce outputs, meaning they feed output columns and rows to the receiving destination adapter or transformation.
  • Transformations: These are tasks used within the data flow of a package to modify data per your particular business needs. Transformations require inputs, meaning they expect an input of columns and rows from either another transformation or a source adapter. Transformations also provide outputs that contain columns and rows, making them available to other transformations or destination adapters.
  • Destination adapters: These write data in a data flow to different types of data sources. Destination adapters use inputs to receive columns and rows from source adapters or transformations.

Source Adapters

Source adapters define the source of the data; the specific table, query, or view when applicable; and the specific columns that need to be extracted. In SSIS, you extract data using the data source adapters listed in Table.

SSIS Source Adapters

SSIS Source Adapters

Destination Adapters

Destination adapters define the endpoint of extracted data, receiving it either directly from a data source adapter or from a transformation. In SSIS, you can extract data using the data destination adapters shown in Table .

SSIS Destination Adapters

SSIS Destination Adapters

Transformations

SSIS provides several new and enhanced transformation tasks, shown in Table, that are used to modify data as it moves from the source data adapter to the destination data adapter.

SSIS Transformations

SSIS Transformations

Export Column Transformation Example

For this example, imagine you’ve been asked by a hypothetical third-party catalog provider to generate thumbnail image GIFs for all of the products in the AdventureWorks application, for use in the third party’s online sales application.

  1. Create a new project and package in BIDS.
  2. Create a new OLE DB connection manager to the AdventureWorks database.
  3. Drag a Data Flow task onto the Control Flow design surface. Double-click it.
  4. From the Toolbox, drag an OLE DB source onto the Data Flow design surface. Double-click it.
  5. In the OLE DB Source Editor dialog box, under Connection, select the SQL Server instance containing the AdventureWorks database. For the Data Access Mode option, select SQL command.
  6. For the SQL command text, enter the following query:

  7. Click OK.
  8. From the Toolbox, drag an Export Column transformation onto the Data Flow design surface.
  9. Click the OLE DB source and drag the green precedence constraint to the Export Column task.
  10. Double-click the Export Column task.
  11. In the Export Column Transformation Editor, under Extract Column, select ThumbnailPhoto. This table column will define the actual name of each file generated. Under File Path Column, select PhotoFileName. Figure shows these selections.
  12. Export Column Transformation Editor dialog box

    Export Column Transformation Editor dialog box

  13. Leave the other options unchecked, and click OK.
  14. Select Debug ➤Start Debugging.

You should see that 101 rows were processed by the Export Column task.

Export Column package after execution

Export Column package after execution

The files should have been generated to your C:Apress directory. You can then use the FTP task to send the files to the client.

Row Sampling Transformation Example

For this example, your company’s HR department would like you to generate a text file containing a random sampling of 20 employees who will be asked to participate in an employee survey. You’ll use the AdventureWorks table HumanResources.Employee for the SQL Server source data, and you’ll use the Row Sampling transformation to export the sampling to a text file.

  1. Create a new project and package in BIDS.
  2. Create a new connection manager to a SQL Server instance set to the AdventureWorks database.
  3. Drag a Data Flow task onto the Control Flow design surface. Double-click it.
  4. Drag an OLE DB source to the surface area. Double-click it. For the name of the table or the view, select HumanResources.Employee.
  5. Select Columns from the left navigation pane. In the Available External Columns check box list, deselect all fields except EmployeeID, ManagerID, and HireDate. Click OK.
  6. Drag a Row Sampling transformation onto the Data Flow design surface.
  7. Click the OLE DB source and drag the green precedence constraint to the Row Sampling transformation.
  8. Double-click the Row Sampling transformation. In the Row Sampling Transformation Editor, change the number of rows to 20. Then click OK.
  9. Drag a Flat File destination onto the design surface.
  10. Click the Row Sampling transformation and drag the green precedence constraint to the Flat File destination. In the Input Output Selection dialog box, select Sampling Selected Output (this feeds the 20 rows selected, rather than the remaining rows not selected). Click OK.
  11. Double-click the Flat File destination. Click the New button next to the Flat File Connection Manager drop-down.
  12. In the Flat File Format dialog box, select Delimited. For the filename, select C: ApressEmployeeSampling. txt (it doesn’t need to already exist). Check the “Column names in thefirst data row” check box. Click OK.
  13. Back in the Flat File Destination Editor, in the left pane, select Mappings. Verify that the input columns map to the destination columns appropriately. Click OK.
  14. Select Debug ➤Start Debugging.

After debugging the package, check the contents of the C:ApressEmployeeSampling.txt file for the 20 sampled employees. Also, if you debug the package again, you’ll notice that you get a different set of employees exported to the file.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics