Skip to main content

Use of Merge Command: Update if changed, Insert when not exists, Delete if not found in Target (Incremental Load)/ UPSERT Command

Hello All,

Today we will be looking how to achieve the Incremental Load concept which is in SSIS, through SQL.

SQL provides us MERGE command to accomplish this task.

We shall start off with creating two tables: 
  1. Source
  2. Destination
Create table Src (ID Int, Name varchar(100), Designation Varchar(100))
Insert Into Src
Select 1, 'Dhina', 'Snr.Analyst'
Union
Select 2, 'Scott', 'Lead Analyst'
Union
Select 5, 'Peter', 'Jnr. Analyst'

Create table Dest (ID Int, Name varchar(100), Designation Varchar(100))
Insert Into Dest
Select 1, 'Dhina', 'Analyst'
Union
Select 2, 'Scott', 'Lead Analyst'
Union
Select 3, 'Brad', 'Test Analyst'


Select * From Src
Select * From Dest
Go

will result as shown in snippet:





As Shown, We need to Update employee 'Dhina' who promoted to Snr.Analyst from Analyst in Destination.
We need to Insert a new guy 'Peter' in destination table as it is not present in destination table "Dest".
and we shall also delete "Brad" whose record is found in destination table "Dest" but not in source table "Src".

In this way, we can maintain, the Source table and Destination table synchronized.

We have two types of loads: 
  • Full Load
  • Incremental Load.
Full Load: Dropping or truncating the Entire dataset and load new value from the Source. This is not recommended when we are having huge amount of records and also when we have dependencies on the destination table.

This method though it is simple but takes lot of time and Non Availability of the Destination/target table at any given point of time.

Incremental Load: This method works on only few records which has changed or added recently. Hence the old records, are untouched and since it is working on few record sets, this method is faster and will not spoil or hamper any dependencies.


Incremental load can be created using SSIS which I would explain in a separate blog. Here we shall concentrate on SQL on how to achieve the same.


merge  [dbo].[dest] as d
       using  [dbo].Src as s
       on d.id=s.id
when matched
       then update set d.designation=s.designation
when  not matched
       then insert (id,name,designation)
       values (s.id,s.name,s.designation)
when  not matched by source

Split wise explanation:

We can see the Merge command has to be followed by Destination Table.
Using must be followed by Source Table. 

When Records are matched based on the ID's, then Update the record if any changes happened.

When records are not matched between source and destination, then it means, its a new record from the source and is not present in destination. Hence use Insert Statement.

When Records are not matching with SOURCE table, then We can delete those records.

The result of the above Upsert Statement is shown in below snippet and we can say, the synchronization is maintained.






Comments

  1. I think there is a need to provide some more information about Upsert operations such as the SSIS Update and Insert.

    SSIS Upsert

    ReplyDelete

Post a Comment

Popular posts from this blog

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" returned message "The Microsoft Access database engine cannot open or write to the file ''. It is already opened exclusively by another user, or you need permission to view and write its data.". Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)".

OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" returned message "The Microsoft Access database engine cannot open or write to the file ''. It is already opened exclusively by another user, or you need permission to view and write its data.". Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)". If you get this error while Loading Data From Excel to SQL Server, then, close the Excel sheet opened and try to run queries again.

Talend ETL Part 1: SQL Server Database to Excel Sheet

Hello All, Of many ETL tools available in Market, One of the strong tool is Talend. Difference between other ETL tools and tools like Pentaho, Talend, Clover ETL, Adeptia Integration etc, is that they support NO SQL Cross domains, BIG Data, Hadoop etc. Other ETL tools like, SSIS, Informatica are now coming with their higher versions, which consists of Hadoop Integration. Basically We can say, there are two databases types. 1) RDBMS (Example: SQL Server, MySQL, Oracle etc) 2) Non RDBMS (Example: MongoDB, InfiniDB etc) Talend Supports Non RDBMS databases. Here I would like to share my hands on experience on Talend and how to use it and explain basic components of Talend. Approx there are 500 components we can find in Talend. So lets Kick Start from Basics. First lets try to load Data from Microsoft SQL Server to Excel. Steps: Step1: Open Talend Studio. Step 2: Right click on Job Design and Create a new Job by giving some job name. Step 3: Give the name o...