Saturday, December 26, 2009

SSIS (SQL Server Integration Services)

SQL Server Integration Services (SSIS) is one of the main issues of Microsoft BI (Business Intelligence) concept representing the data integration and data transformation in the enterprise level approach.

SSIS (SQL Server Integration Services) has the abilty to gather data from various resources in different kinds of formats, process this data, transform data and convert the processed data into any shape that you can use in your daily business as well as for data-mining and data warehouse applications.

By using SSIS, SQL Server developers and Database Administrators can reach data from XML files, text files, CSV files as well as relational databases.

By using SSIS, emailing data or many tasks like storing the processed data into a file folder can be managed easily bu not writing a line of code using the SSIS task packages.

Integration Services provides a platform to build data integration and workflow applications. The primary use for SSIS is data warehousing, as the product features a fast and flexible tool for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases, update multidimensional cube data, and perform other functions.

Introduction

Maybe the question should be what isn’t new with SQL Server Integration Services. SQL Server Integration Services or SSIS is a completely new product written from the ground up for SQL Server 2005. It replaces DTS, a feature introduced with SQL Server 7 and enhanced with the SQL Server 2000 release. SSIS, like DTS, is a tool to import, export, and transform data. If you haven’t had a chance to play with SSIS yet, you’ll be surprised at how different it is and how much more it can do.

Three dimensional

The first difference I noticed, besides the fact that the work to build packages is done is Visual Studio 2005, is that there are three layers in which to work: Control Flow, Data Flow and Event Handlers.

The Control Flow page is the main area used to build the package. On the Control Flow area you can add many new tasks. Some are containers to hold other tasks such as the For Loop Container. Others perform work like the Execute SQL Task. Some are entirely new like the group of maintenance plan tasks. The one task that is the heart of the Control Flow area is the Data Flow Task. Once you add a Data Flow Task you can drill down to the Data Flow layer. A Data Flow page is created for each Data Flow Task added to the Control Flow page.

On the Data Flow page you set up connections and transforms to import, transform, and export data. Once again, there are some Data Flow Source and Destination objects that you would expect to see and a few surprises, like the XML data source. The next surprise is the number of Data Flow Transformations available. From Aggregate to Fuzzy Lookup to Unpivot, I counted 28 in all. Figure 1 shows a typical Data Flow where some data is imported, a new column is derived, and the results exported to a file.

http://www.sqlservercentral.com/articles/whatsnewwithssis/2234/newssis_Fig1.jpg

Figure 1: A typical data flow

Event Handlers is the third of the three layers. It is possible to attach an event handler to any of several events on each task at the Control Flow level. The tasks available in the Event Handlers Toolbox are identical to the Control Flow Toolbox, even the Data Flow task. You can also assign event handlers to a task within an event handler. The good thing is that you have lots of flexibility, the bad thing is that your package could become extremely complex and hard to understand so use caution here. You can use the event handlers to handle errors or check something before a task executes.

More control

SSIS gives you many new ways to control the flow of your package that could only be done before by writing code. One of the coolest control features is the ability to set up looping within the package. Two tasks, the For Loop Container and Foreach Loop Container, are available for this purpose. The For Loop Container (see Figure 2) can be used to repeat a set of tasks a given number of times. The Foreach Loop Container can be used to perform the tasks on each item in a collection, a file in a folder perhaps.

http://www.sqlservercentral.com/articles/whatsnewwithssis/2234/newssis_fig2.jpg

Figure 2: A For Loop Container

The precedence constraints used to connect one task to the next have been enhanced as well. The control flow between tasks may be based on the value of an expression in addition to or instead of the outcome of the previous task (see Figure 3). Again, you have the ability to fine-tune your package with little or no code.

http://www.sqlservercentral.com/articles/whatsnewwithssis/2234/newssis_fig3.jpg

Figure 3: The Precedence Constraint Editor

A package may also be restarted from the task that failed if Checkpoints are enabled in a package. Basically, you configure a file to log information as the package runs (see Figure 4). Then set the FailPackageOnFailure property to True for each task. If your package fails, you correct the problem and restart the package. The package will start up from the point of failure allowing you to save time and resources over starting the package from the beginning.

http://www.sqlservercentral.com/articles/whatsnewwithssis/2234/newssis_fig4.jpg

Figure 4: How to enable Checkpoints

Variables and expressions

System variables for the package and each task are available, and you can set up user variables as well. For example, you need to set up a variable to count the loops when using the For Loop Container. User variables can be set up at the package level or the scope of the variable can be limited to a container or task.

The configuration dialog of each Control Flow task has an Expressions page. Each task property can be controlled dynamically by using expressions. Use variables and the built in functions to build an expression and assign it to a property. This replaces the Dynamic Property Task found in DTS.

Debugging

The ability to step through a package as it runs will save lots of time when troubleshooting errors. Breakpoints can be assigned to events on each Control Flow task allowing you to view the value of variables. Figure 5 shows how to make the For Loop Container break when the loop count equals five.

http://www.sqlservercentral.com/articles/whatsnewwithssis/2234/newssis_fig5.jpg

Figure 5: Breakpoints

In the Data Flow area, Data Viewers may be configured at each step to view the data as it is imported or transformed (see Figure 6). A graph can also be configured as a Data Viewer if that would make more sense. After viewing the data, click the green arrow to resume the package.

http://www.sqlservercentral.com/articles/whatsnewwithssis/2234/newssis_fig6.jpg

Figure 6: A Data Viewer

Easier coding

There are many tasks and transforms that will reduce the need for scripting. Luckily, when you must write some code, the Script Task uses the Microsoft Visual Studio for Applications environment complete with Intellisense to help you navigate the SSIS object model. Variables, as long as they are in scope, may be accessed by the script. To those of you who prefer C#, sorry, only Visual Basic.Net is allowed. Don’t get this task confused with the old ActiveX Script Task still available for converted DTS packages.

Upgrading existing DTS packages

A wizard is provided to convert existing DTS packages to SSIS. Don’t count on this to work 100% of the time, though, because there is not a one-to-one mapping between the DTS and SSIS objects. Many of you have packages that are pretty complex, and you may have had to jump through some hoops to accomplish your goals. The dtsrun command is still supported, and a task specifically for running DTS packages is included in the Control Flow Toolbox. Be sure to read Brian Knight’s article, Upgrading SQL Server 2000 DTS to SSIS for more information.