Skip to main content

SQL Server Integration Services (SSIS) Fundamentals

SQL Server Integration Services (SSIS) is a Business Intelligence tool used by Database Developers or Administrators to perform Extract Transforms and Load (ETL) Operations



Initially before SSIS, it was Data Transformation Services (DTS). Evolution of DTS led to currently used SSIS.

The SSIS designer will always have three tabbed pages 
1) Control Flow,
2) Data Flow, and 
3) Event handlers. 


An additional tab pops up during execution to show the progress. 


Package Explorer: A Package Explorer tab displays the contents of the package 
The Execution Results tab shows the results after execution. 


Connection Managers: Since databases are at the heart of SSIS, a control tray of Connection Managers will display all the used connections and available connections for the package.


Tool box:
The SSIS toolbox contains the various items needed for Control Flow as well as tasks needed for maintenance plans.

In SQL Server 2008 Integration Services, There are around
  • 28 Different control Flow tasks
  • 11 Maintenance Plan task
CONTROL FLOW
The Control Flow Tasks within SQL Server Integration Services (SSIS) can be grouped into 9 major categories based on usage.

The different categories for control flow tasks are listed below:
  1. Container Tasks
  2. Data Flow Tasks
  3. Data Preparation Tasks
  4. Workflow Tasks
  5. SQL Server Tasks
  6. WMI Tasks (Windows Management Instrumentation)
  7. Scripting Tasks
  8. Backward Compatibility Tasks
  9. SQL Server Analysis Services Tasks
CONTAINER TASKS

Container control flow tasks within SSIS can be used by database developers or database administrators when providing a structure to the package and to repeat control flows within the package. 

It can also be used to group tasks.
Different types of container tasks are 
  1. For Loop Container
  2. Foreach Loop Container
  3. Sequence Container and
  4. Task Host. 











Regarding each will be explained in following posts.

We can also group two or more tasks together.
To do this, just select tasks you need to group and right click and select Group option.


In the following pic, I have grouped dataflow task and Script tasks.














DATA FLOW TASKS:


Data Flow tasks at SSIS is mainly used to move/copy data from Source to required destination.


Data Flow tasks give accessibilities to use transformations.



Transformations are business logics which are used to make inconsistent data to consistent format


Using transformation, we are going to clean data so that it is useful format.


We can perform Extract, Transform & Load (ETL) operations using Data Flow task.


DATA PREPARATION TASKS: 


Data preparation tasks in SSIS can be used to Fetch the data or validate it to determine the quality of data


The different types of data preparation tasks. They are as follows

  • File System Task
  • FTP Task
  • Web Service Task
  • XML Task and 
  • Data Profiling Task.


SQL Server Tasks:


SQL Server tasks in SSIS can be used to copy, modify and delete SQL Server objects and data.








Use the Bulk Insert Task to insert data from a text file or a flat file into an SQL Server database table
(In the same way that the BULK INSERT statement or the BCP.EXE command line tool is used) 

The Execute SQL Task can be used to execute any TSQL code such as truncating a staging table, running queries, stored procedures etc. 

The Transfer Database Task can be used to transfer database objects from one database to another. 
(sys objects will contain systables,procedures,primarykey foreign key,user table)

The Transfer Error Message Task can be used to transfer user defined error messages from one instance of SQL Server to another.

The Transfer Job Task can be used to transfer SQL Server Agent Job between instances of SQL Server. 

The Transfer Login Task can be used to transfer logins between instances of SQL Server. 

The Transfer Master Stored Procedure Task can be used to transfer user defined stored procedures between the MASTER databases on instances of SQL Server. 

The Transfer SQL Server Objects Task can be used to transfer tables or stored procedures between instances of SQL Server

WMI Tasks:- 


Windows Management Instrumentation (WMI) Tasks is used to read WMI data and watch for WMI events. (WMI- set of environment-independent specifications which allow management information to be shared between management applications).

The WMI Data Reader Task in SSIS can be used to run WQL queries against the windows management instrumentation to read the event logs, determine hardware information etc.
The WMI Event Watcher Task in SSIS helps SSIS to wait for certain WMI events and then respond to the event.






Scripting Tasks:– Scripting tasks within SSIS can be used to perform certain functions which are not fully supported by the existing tasks in SSIS.





Backward Compatibility Tasks:–
Backward compatibility tasks within SSIS can be used to execute legacy application scripts with the SSIS packages. (legacy application- An application in which a company or organization has already invested considerable time and money. Typically, legacy applications are database management systems (DBMSs) running on mainframes or minicomputers. An important feature of new software products is the ability to work with a company's legacy applications, or at least be able to import data from them.) The types of tasks supporting Backward Compatibility Tasks are ActiveX Script Task and the Execute DTS 2000 Package Task.


The ActiveX Script Tasks is included in SSIS 2005 and SSIS 2008 for backward compatibility with DTS 2000.
The Execute DTS 2000 Package Tasks can be used to execute DTS packages (SQL Server 2000) from within an SSIS 2005 or SSIS 2008 packages.





SQL Server Analysis Services Tasks: – 
SQL Server Analysis Services Tasks within SSIS can be used to create, modify, delete and process Analysis Services objects. The different types of Analysis Services tasks are

  • Analysis Services Execute DDL Task, 
  • Analysis Services Processing Task and 
  • Data Mining Query Tasks.
The SQL Server Analysis Services Execute DDL Task is similar to Execute SQL Task, however use of Analysis Services Execute DDL Task can issue Data Definition Language statements against an Analysis Services system. The DDL statements can be used to create cubes, dimensions, KPI’s or any other analytical processing OLAP objects.
The Analysis Services Processing Task can be used to process analysis services objects such as cubes, dimensions and mining models.
The Data Mining Query Task can be used to run prediction queries based on data mining models built in analysis services. A prediction query creates a prediction for new data by using the mining models chosen.






Maintenance Plan Tasks
In SQL Server 2008 Integration Services there are around 11 Maintenance Plan Tasks available. Database Administrators can create database maintenance plans either by using Maintenance Plan Wizard or by using SSIS designer. 
The Maintenance Plan Wizard helps you set up the core maintenance tasks to make sure that your database performs well, is regularly backed up, and is free of inconsistencies. The Maintenance Plan Wizard creates one or more SQL Server Agent jobs that perform these tasks on local servers or on target servers in a multi server environment. Execution can be at scheduled intervals.
To create or manage maintenance plans, you must be a member of the sysadmin fixed server role. Note that Object Explorer only displays maintenance plans if the user is a member of the sysadmin fixed server role. However, to create an enhanced workflow it is advised to create maintenance plan using SSIS designer.




A DBA can create a database maintenance plan either by using the Maintenance Plan Wizard or by using the SQL Server Integration Services (SSIS) designer. By using the Maintenance Plan Wizard you can create a very basic maintenance plan for all the system and user databases. However, if you want to create an enhanced workflow then it is advised to create a maintenance plan using the SSIS designer.


1)Back Up Database Task:–  Back Up database task different types of database backups to be performed; such as Full, Differential or Transactional Log backups based on the Recovery Model of the System or User databases. In the Maintenance Plan Wizard this task is subdivided into three individual tasks namely Back Up Database (Full), Back Up Database (Differential) & Back Up Database (Transaction Log). If you have chosen only a single database then it will even allow you to perform file and filegroup backups. Once the database backup has completed successfully within this task there is an option to verify the database backup integrity.



1) FULL Database Backups: Will take full database backup.This includes part of the transaction log so that the full database backup can be recovered. Full database backups represent the database at the time the backup finished.


2) Differential Database Backups: will take backup from This full backup is called the differential base. Differential database backups are smaller and faster than full database backups. This saves backup time at the cost of increased complexity. For large databases, differential backups can occur at shorter intervals than database backups. This reduces the work-loss exposure.
Differential database backups are especially useful if a subset of a database is modified more frequently than the rest of the database. In these cases, differential database backups enable you back up frequently without the overhead of full database backups.


3)Transaction log backups: The transaction log is a serial record of all the transactions that have been performed against the database since the transaction log was last backed up. With transaction log backups, you can recover the database to a specific point in time (for example, prior to entering unwanted data), or to the point of failure.


When restoring a transaction log backup, Microsoft® SQL Server™ rolls forward all changes recorded in the transaction log. When SQL Server reaches the end of the transaction log, it has re-created the exact state of the database at the time the backup operation started. If the database is recovered, SQL Server then rolls back all transactions that were incomplete when the backup operation started.


Transaction log backups generally use fewer resources than database backups. As a result, you can create them more frequently than database backups. Frequent backups decrease your risk of losing data.


2) Execute SQL Server Agent Job Task:–  Execute SQL Server Agent Job Task can be used to run SQL Server Agent Jobs that are created on the SQL Server Instance.


3) Execute T-SQL Statement Task:– Execute T-SQL Statement Task can be used to execute some Transact SQL queries against databases. Example : Drop database database_name; etc


4) History Cleanup Task:-  History Cleanup Task deletes the historical data related to database backups and restore activities; SQL Server Agent Job history, database maintenance plan history etc. This task basically uses sp_delete_backuphistory system stored procedure to clean up the history prior to the number of days, weeks or months from the current system date. Requires membership in the sysadmin fixed server role, but permissions can be granted to other users.
(Reduces the size of the backup and restore history tables by deleting the entries for backup sets older than the specified date. Additional rows are added to the backup and restore history tables after each backup or restore operation is performed; therefore, it is recommend that you periodically execute sp_delete_backuphistory)
NOTE: The backup and restore history tables reside in the msdb database.


5) Maintenance Cleanup Task:– Maintenance Cleanup Task can be used to remove the older files like maintenance plan execution reports, database backup files etc. Use this task when creating maintenance plans it will remove old files which are not required. 


6)Notify Operator Task:– Use the Notify Operators Task dialog to add an automatic notification to this maintenance plan. To use this task you must have Database Mail enabled and properly configured with MSDB as a Mail Host Database, and have a Microsoft SQL Server Agent operator with a valid e-mail address.
This task uses the sp_notify_operator stored procedure.

7) Rebuild Index Task:- Use the Rebuild Index Task dialog to re-create the indexes on the tables in the database with a new fill factor. The fill factor determines the amount of empty space on each page in the index, to accommodate future expansion. As data is added to the table, the free space fills because the fill factor is not maintained. Reorganizing data and index pages can re-establish the free space. Rebuild Index task can be used to organise data on the data and index pages by rebuilding indexes. This helps to improve the performance of index seeks and scans.


The Rebuild Index Task uses the ALTER INDEX statement when connected to SQL Server 2005 or higher, and the DBCC DBREINDEX statement when connected to SQL Server 2000.


8)Reorganize Index Task:- Use the Reorganize Index Task dialog to move index pages into a more efficient search order. This task uses the ALTER INDEX REORGANIZE statement with SQL Server 2008 databases, and DBCC INDEXDEFRAG with SQL Server 2000 databases.


9) Shrink Database Task:– Shrink Database Task can be used to reduce the disk space which is consumed by the database and log files by removing the empty data and log pages. Using this task the space gained after shrinking the database can either be returned to the operating system or it can be retained within the database for its future growth. This task executes the DBCC SHRINKDATABASE statement.







Comments

Post a Comment

Popular posts from this blog

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

How to Copy or Move Multiple Files from One Folder to Another Folder using Talend

Hello all, In this Post, I will explain how to move Multiple Files from One Folder (Say Source) to Other folder (Say Destination). This Post will also helps you to understand How to Declare Variable and Use it. To Declare a variable, We are go to use Contexts option in repository. Lets say we have two .txt files in Path D:/Source/ . My Requirement is to move the files from Source Folder ( D:/Source/ ) to Destination Folder ( D:/Dest/ ). Step 1: Open a New job Step 2: Now right click and Create a New Contexts from Repository. Give some Name and give Next. Step 3: Now Fill in the Source Directory Details where the loop on files should happen as shown in the snippet and give finish. Step 4: Now Context is created and The values will be changing based on each file in Folder. Step 5: Click and Drag the context from Repository to Context Job Window below the Job Designer. Step 6: If we Expand the Contexts, We can find the variable SourcePath is holdi...

How to move multiple files in ssis and also rename simultaneously

There are two ways to achieve this. 1) We can move the flat files and then rename it. 2) While moving files itself, automatic rename should be done. We will do the second type. The criteria is to rename the files while moving from source to destination. So for that, we need FILE SYSTEM TASK to be included. Secondly since we need to move many files, we will use FOR EACH LOOP CONTAINER. To fetch all the files, we can use FOR EACH LOOP task in SSIS. In collection tab, we can select FOREACH FILE enumerator option for fetching files and we can change enumerator configuration Folder option: Points to source where we need to fetch files. Files: will give us idea whether we need to fetch all the files (*.*) or if we give extension like *.txt, it is going to fetch only  .txt files . Once I give Source name in FOR EACH LOOP container, It is going to fetch all the files corresponding to that path. Retrieve file name: This option is used to let the variables mentioned in V...