Skip to content

Creating a Solid SSIS ETL Solution: Part 1 – Package and Error Logging

November 14, 2010

Creating a Solid SSIS ETL Solution: Navigation
Part 1. – Package and Error Logging
Part 1. – Package and Error Logging (cont.)
Part 2. – Package Configurations
Part 3. – Package Template Overview
Part 4. – Email Task and Package Execution Report

SSIS as an ETL solution

SSIS is a very powerful ETL tool, which comes packaged with SQL Server 2005+. It comes packed with many powerful conversion tools and connection types for almost any ETL process. For any DBA or BI professional thinking about using SSIS as their primary ETL solution, a good template and logging/error tracking system is essential to create a solid ETL environment. While SSIS does come pre-packaged with a logging system, it just doesn’t cover all the bases. SSIS also does not come with a basic template.

Before I go any further, all scripts, the package template, RS package execution report, and test data can be found in the associated zip file linked at the end of this post. Before trying to use the files, make sure you run through at least the schema and configuration scripts, else when the package is opened it will error out as it tries to load the package configurations for my personal test system.

Recently I was tasked to start moving data from some flat file downloads my company gets from various vendors from across the country via FTP into specified tables in our database. Having never used SSIS to ETL data, I didn’t have any scripts or templates already built to create a solid ETL solution.

For the next few weeks I’ll be going over the logging system, package template, and package execution report I created to build a solid base for using SSIS as our ETL solution.

Custom Logging/Error Tracking System

The very first thing I wanted out of this system was execution logging and error message/data capture. I did find that SSIS does come packaged with some built in logging, but the logging is static, doesn’t collect all the error information I want, nor does it capture any rows which error out during transformation or loading. It can be very difficult to troubleshoot an automated process if the proper error messages and data which caused the error are not captured. For these reasons I decided to create my own custom SSIS logging solution.

First I needed to set up the schema, which would capture the logging information. The logging associated with all SSIS package executions are captured in the AdminDB database in four simple tables. The database and table drop/create scripts can be found in the SSISPackageLoggingSystemSchema folder. To create the table structure, run the AdminDB database creation script and then run each table creation script in SSMS in the following order:

    1. SSISPackageLogResultType
    2. SSISPackageLog
    3. SSISPackageLogDetail
    4. SSISPackageLogDetailErrorRecord

Each table performs a specific purpose. The SSISPackageLog table captures every execution of a package on the system. It captures the package name, start and end times, and the general result of the package’s execution. The SSISPackageLogResultType table is a simple list of package result types: success, data loading errors, and system failure (package failure). The SSISPackageLogDetail table records every data flow task execution and it’s details, such as: parent package ID, data flow task name, start and end times, successful row count, business failure row count, system failed row count, and the result of the data flow task (same result types as at the package level). The SSISPackageLogDetailErrorRecord records each row of data which errors out of the ETL process for whatever reason, usually a data type incompatability or out of range issue. It captures parent data flow task ID as well as a string representing the exact row of data that failed the data flow process.

In order to connect these tables with our package template, I created five stored procedures, which can be found in the SSISPackageLoggingSystemProcs folder. There are package log and package log detail start and end sprocs as well as a sproc for recording any package errors.

The sprocs themselves are not too important. How they are used in the SSIS template is.

Starting the logs

Right away when package execution begins, the first task in the package starts the logging process. As you can see in the following screen shot, the package log start task sends the package name and start time as parameters to the SSISPackageLogStart sproc. A record is inserted in the SSISPackageLog table, and the ID of that record is sent back to the package and stored, so the package log end task can update it with the results of execution.

Photobucket

Next up in the logging process is the package log detail start task, which fires off just before any data flow task runs (place a copy of this task before each data flow). As can be seen in the associated image below, this task and sproc are much like the package log start duo from above. This task passes the associated package log row ID, data flow task name, and path to the data source as parameters to the SSISPackageLogDetailStart sproc, which inserts a row into the SSISPackageLogDetail table. The resulting ID is then passed back to the package, which it again will use to update that same record with the results of the data flow task execution.

Photobucket

Recording Data Flow Counts and Logging Error Rows

Now that we’ve got the log records started, the package delves into the data flow, where it contains redirection paths for data records from the source which do not fit into the ETL process or final destination table. As can be seen in the image below, each record which does not fit is converted into a generic string and recorded in the SSISPackageLogDetailErrorRecord table. The error data flow destination records the ID of the current data flow task as well as the erroneous row data string. The basic approach to this error pipe is to union the error strings from any conversion task, and then pass it to the destination table. When you add a conversion task simply add another derived column task, create the error row concatenation in the task, and then add it to the union all task.

Photobucket

If you noticed the data flow task is counting how many successful rows, business failure rows, and system failure rows are passing through the flow. It’s collecting these counts into three package variables: v_SSISPackageLogRowCountSuccesses, v_SSISPackageLogRowCountBusFailures, v_SSISPackageLogRowCountSysFailures. Once the data flow task ends, this information is used to change the data flow task result type and message if any business or ‘system’ failure rows were recorded.

Time to close up shop

After the data flow task and the script task which follows have finished executing, the row counts, data flow task ID, result ID, and result message are sent back to the SSISPackageLogDetail table via the SSISPackageLogDetailEnd procedure, as is shown in the following image. The SSISPackageLogDetailEnd sproc uses this information to update the package log detail record started by the package log detail start task with end time, row counts, result id, and message, effectively closing out the data flow logging process.

Photobucket

Finally once the package has finished archiving and deleting the original source file, the Package Log End task fires. It sends the Package Log ID and package execution result ID to the SSISPackageLog table via the SSISPackageLogEnd sproc, which updates the original record inserted by the Package Log Start task with the end time and result type ID. This closes out the entire logging process, and from here the package sends out an email with execution results.

Photobucket

Next: Logging Errors

Now that the logging schema, start/end process, and data flow error capture is all set up, next I’ll show how I handle and record package errors. ==>

Example Files

Files and Test Data

Advertisements
11 Comments
  1. John permalink

    Any chance you can post the example files again? They seem to be gone from swoopshare.

  2. Jonathan Amen permalink

    Yep the link seems to be broken. I’ll see what I can do tonight. Thanks for the heads up!

  3. Jonathan Amen permalink

    There. Try the link again.

  4. John permalink

    Got an “Invalid or Deleted File” message.

  5. Jonathan Amen permalink

    I’ve reloaded the zip file. It contains the updated report and sprocs. Try the link again, please. I’ve tried it on many of the posts, and can get a download.

  6. John permalink

    It worked this time. Thanks a lot, Jonathan!

  7. Eugene Liu permalink

    Do you have sql 2005 version ?

  8. Jonathan Amen permalink

    Sorry. I don’t. I’m not sure what it would take to make this 2005 compatible.

  9. Henrik permalink

    Why you don’t use “Event handler” PreExecute, PostExecute and OnError.
    Scope these Event Handlers to the whole package and you can record all these events. And so you don’t need to fill up your clean dataflow with logging steps.

  10. Hem-Art permalink

    Nice Post..Thank You so Much..It will help me in my project…

  11. Rajesh permalink

    Hi Jonathan amen,
    I am Rajesh.Mirsala,I have been through your article its really very helpful

    Thanks a a lot…
    Keep going…………….
    Bye…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: