Problem Description: Perform Incremental Load using SSIS package. There is one Source table with ID (may be Primary Key), CreatedDate and ModifiedDate along with other columns. The requirement is to load the destination table with new records and update the existing records (if any updated records are available).
Soultion:
You can use Lookup Transformation where you compare source and destination data based on some id/code and get the new and updated records, and then use Conditoional Split to select the new and updated rows before loading the table.
However, I don't recommend this approach, specially when destination table is very huge and volume of delta is very high.
You can do it in simple steps:
STEP1:
Create a new Package IncrementalLoad.dtsx and add following package variables:
STEP2:
Create Source, Destination, and staging tables using below code:
-- Source Table (create in source DB)
IF OBJECT_ID('dbo.TestSource','U') IS NOT NULL
DROP TABLE dbo.TestSource
GO
CREATE Table dbo.TestSource
(
[ID] [int] IDENTITY(1,1)
,[Code] [varchar](10)
,[Description] [varchar](100)
,[CreatedDate] [datetime] NOT NULL default GETDATE()
,[ModifiedDate] [datetime] NOT NULL default GETDATE()
)
GO
-- Destination Table (create in destination DB)
IF OBJECT_ID('dbo.TestDestination','U') IS NOT NULL
DROP TABLE dbo.TestDestination
GO
CREATE Table dbo.TestDestination
(
[ID] [int]
,[Code] [varchar](10)
,[Description] [varchar](100)
,[CreatedDate] [datetime] NOT NULL
,[ModifiedDate] [datetime] NOT NULL
)
GO
-- Staging Table (create in destination DB)
IF OBJECT_ID('dbo.TestDestinationSTG','U') IS NOT NULL
DROP TABLE dbo.TestDestinationSTG
GO
CREATE Table dbo.TestDestinationSTG
(
[ID] [int]
,[Code] [varchar](10)
,[Description] [varchar](100)
,[CreatedDate] [datetime] NOT NULL
,[ModifiedDate] [datetime] NOT NULL
)
GO
STEP3:
Create two OLE DB Connection Manager, one for Source Server and another for Destination Server.
In the connection manager properties, set the expression ConnectionString with respective variables as shown below:
STEP4:
Drag and drop Execute SQL Task and name it -
Execute SQL Task - Get Max ID and Last ModifiedDate.
Double click on EST and set following properties:
ResultSet = Single row
Connection = Destination Server
SQLStatement =
SELECT
ISNULL(MAX(ID) ,0) MaxID,
ISNULL(MAX(ModifiedDate),'2000-01-01') MaxModifiedDate
FROM TestDestination (NOLOCK)
Drag and drop another Execute SQL Task to Truncate staging table. Rename the task - Execute SQL Task - Truncate staging Table and set following properties:
ResultSet = None
Connection = Destination Server
SQLStatement = Truncate Table dbo.TestDestinationStg
Drag and drop Data Flow Task and name it Data Flow Task - Pull New and Updated rows.
Double click on DFT or click on Data Flow tab.
Now drag and drop OLE DB Source and select Source Server in OLE DB connection manager, select SQL command in Data access mode, and write following T-SQL code in SQL command text:
SELECT [ID],[Code],[Description],
[CreatedDate],[ModifiedDate]
FROM TestSource
WHERE [ID] > ? OR [ModifiedDate] >= ?
Click on Columns to generate metadata and set Parameters for above query:
Now drag and drop OLE DB Destination task and select Destination Server in OLE DB Connection manager, Table or view - fast load in Data access mode, and dbo.TestDestinationSTG in Name of the table or view.
Now click on Mapping to map the metadat with source columns.
Drag and drop one more Execute SQL Task and rename it to Execute SQL Task - Insert and Updated new rows and set following properties:
ResultSet = None
Connection = Destination Server
SQLStatement =
-- INSERT New Records
INSERT INTO TestDestination
([ID],[Code],[Description],
[CreatedDate],[ModifiedDate])
SELECT
[ID],[Code],[Description],
[CreatedDate],[ModifiedDate]
FROM TestDestinationStg
WHERE ID > ?
--UPDATE modified records
UPDATE D
SET D.[ID] = S.[ID]
,D.[Code] = S.[Code]
,D.[Description] = S.[Description]
,D.[CreatedDate] = S.[CreatedDate]
,D.[ModifiedDate] = S.[ModifiedDate]
FROM TestDestination D
JOIN TestDestinationStg S
ON S.ID = D.ID
WHERE
S.ID <= ? AND
S.ModifiedDate > D.ModifiedDate
Click on Parameter Mapping and set the parameters as shown below:
Finally package will look like below snapshot:
STEP5:
We are done with package development. Its time to test the package. I will test this package in three steps:
1. Insert few dummy records in Source table while keeping Destination table empty.
--CASE 1: First Time Execution
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('AAA','American Automobile Association')
WAITFOR DELAY '00:00:01.100' -- delay between two rows
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('ABC','Associated Builders & Contractors')
WAITFOR DELAY '00:00:01.150'
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('RR','Road Runner')
GO
Now execute the package and check the destination data. You can add Data viewers in Data Flow to see the result at run time as shown below:
2. Insert few more records in Source table to check whether new records are inserted into Destination table.
--CASE 2: Only New Records
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('TSL','Trina Solar Limited')
WAITFOR DELAY '00:00:01' -- delay between two rows
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('FSLR','First Solar, Inc.')
Now execute the package and check for new records:
3. Insert few more records and update few existing records in Source table and check whether both table are in sync.
--CASE 3 -- New & Updated Records
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('LDK','LDK Solar Co., Ltd')
UPDATE dbo.TestSource
SET [Description] = 'Associated Builders and Contractors',
ModifiedDate = GETDATE()
WHERE [Code] = 'ABC'
Finaly check source and destination tables for match. Since I have both Source & Destination tables in one server, I can use EXCEPT command (for this example) which retunrs no records i.e. both the tables are in sync.
Cheers!!!
You can use Lookup Transformation where you compare source and destination data based on some id/code and get the new and updated records, and then use Conditoional Split to select the new and updated rows before loading the table.
However, I don't recommend this approach, specially when destination table is very huge and volume of delta is very high.
You can do it in simple steps:
- Find the Maximum ID & Last ModifiedDate from destination and store in package variables.
- Pull the new and updated records from source and load to a staging table using above variables.
- Insert and Update the records using Execute SQL Task
STEP1:
Create a new Package IncrementalLoad.dtsx and add following package variables:
VariableName | Description | Examle |
DestinationConnStr | Connection string for destination server/db |
Data Source=(local); Initial Catalog=TestHN; Provider=SQLNCLI10.1; Integrated Security=SSPI; Auto Translate=False; |
MaxID | Max ID from destination table |
100 |
MaxModifiedDate | Max Date from destination table |
2010:11:10 11:50:20.003 |
SourceConnStr | Connection string for source server/db |
Data Source=(local); Initial Catalog=TestHN; Provider=SQLNCLI10.1; Integrated Security=SSPI; Auto Translate=False; |
STEP2:
Create Source, Destination, and staging tables using below code:
-- Source Table (create in source DB)
IF OBJECT_ID('dbo.TestSource','U') IS NOT NULL
DROP TABLE dbo.TestSource
GO
CREATE Table dbo.TestSource
(
[ID] [int] IDENTITY(1,1)
,[Code] [varchar](10)
,[Description] [varchar](100)
,[CreatedDate] [datetime] NOT NULL default GETDATE()
,[ModifiedDate] [datetime] NOT NULL default GETDATE()
)
GO
-- Destination Table (create in destination DB)
IF OBJECT_ID('dbo.TestDestination','U') IS NOT NULL
DROP TABLE dbo.TestDestination
GO
CREATE Table dbo.TestDestination
(
[ID] [int]
,[Code] [varchar](10)
,[Description] [varchar](100)
,[CreatedDate] [datetime] NOT NULL
,[ModifiedDate] [datetime] NOT NULL
)
GO
-- Staging Table (create in destination DB)
IF OBJECT_ID('dbo.TestDestinationSTG','U') IS NOT NULL
DROP TABLE dbo.TestDestinationSTG
GO
CREATE Table dbo.TestDestinationSTG
(
[ID] [int]
,[Code] [varchar](10)
,[Description] [varchar](100)
,[CreatedDate] [datetime] NOT NULL
,[ModifiedDate] [datetime] NOT NULL
)
GO
STEP3:
Create two OLE DB Connection Manager, one for Source Server and another for Destination Server.
In the connection manager properties, set the expression ConnectionString with respective variables as shown below:
STEP4:
Drag and drop Execute SQL Task and name it -
Execute SQL Task - Get Max ID and Last ModifiedDate.
Double click on EST and set following properties:
ResultSet = Single row
Connection = Destination Server
SQLStatement =
SELECT
ISNULL(MAX(ID) ,0) MaxID,
ISNULL(MAX(ModifiedDate),'2000-01-01') MaxModifiedDate
FROM TestDestination (NOLOCK)
Drag and drop another Execute SQL Task to Truncate staging table. Rename the task - Execute SQL Task - Truncate staging Table and set following properties:
ResultSet = None
Connection = Destination Server
SQLStatement = Truncate Table dbo.TestDestinationStg
Drag and drop Data Flow Task and name it Data Flow Task - Pull New and Updated rows.
Double click on DFT or click on Data Flow tab.
Now drag and drop OLE DB Source and select Source Server in OLE DB connection manager, select SQL command in Data access mode, and write following T-SQL code in SQL command text:
SELECT [ID],[Code],[Description],
[CreatedDate],[ModifiedDate]
FROM TestSource
WHERE [ID] > ? OR [ModifiedDate] >= ?
Click on Columns to generate metadata and set Parameters for above query:
Now drag and drop OLE DB Destination task and select Destination Server in OLE DB Connection manager, Table or view - fast load in Data access mode, and dbo.TestDestinationSTG in Name of the table or view.
Now click on Mapping to map the metadat with source columns.
Drag and drop one more Execute SQL Task and rename it to Execute SQL Task - Insert and Updated new rows and set following properties:
ResultSet = None
Connection = Destination Server
SQLStatement =
-- INSERT New Records
INSERT INTO TestDestination
([ID],[Code],[Description],
[CreatedDate],[ModifiedDate])
SELECT
[ID],[Code],[Description],
[CreatedDate],[ModifiedDate]
FROM TestDestinationStg
WHERE ID > ?
--UPDATE modified records
UPDATE D
SET D.[ID] = S.[ID]
,D.[Code] = S.[Code]
,D.[Description] = S.[Description]
,D.[CreatedDate] = S.[CreatedDate]
,D.[ModifiedDate] = S.[ModifiedDate]
FROM TestDestination D
JOIN TestDestinationStg S
ON S.ID = D.ID
WHERE
S.ID <= ? AND
S.ModifiedDate > D.ModifiedDate
Click on Parameter Mapping and set the parameters as shown below:
Finally package will look like below snapshot:
STEP5:
We are done with package development. Its time to test the package. I will test this package in three steps:
1. Insert few dummy records in Source table while keeping Destination table empty.
--CASE 1: First Time Execution
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('AAA','American Automobile Association')
WAITFOR DELAY '00:00:01.100' -- delay between two rows
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('ABC','Associated Builders & Contractors')
WAITFOR DELAY '00:00:01.150'
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('RR','Road Runner')
GO
2. Insert few more records in Source table to check whether new records are inserted into Destination table.
--CASE 2: Only New Records
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('TSL','Trina Solar Limited')
WAITFOR DELAY '00:00:01' -- delay between two rows
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('FSLR','First Solar, Inc.')
Now execute the package and check for new records:
3. Insert few more records and update few existing records in Source table and check whether both table are in sync.
--CASE 3 -- New & Updated Records
INSERT INTO dbo.TestSource ([Code],[Description])
VALUES ('LDK','LDK Solar Co., Ltd')
UPDATE dbo.TestSource
SET [Description] = 'Associated Builders and Contractors',
ModifiedDate = GETDATE()
WHERE [Code] = 'ABC'
Finaly check source and destination tables for match. Since I have both Source & Destination tables in one server, I can use EXCEPT command (for this example) which retunrs no records i.e. both the tables are in sync.
Cheers!!!
Great article with nice example... thanks a lot.
ReplyDeleteNice One
ReplyDeleteWhy do we need to take the max(ID) from destination?? Max(ModifiedDate) From destination is fine ..rt? Then using this Max(ModifiedDate) , retrieve records from Source where the modifiedDate > Max(ModifiedDate) .
ReplyDeleteOr if you are using 2008 R2 , you can use CDC feature
There are several ways to do that Manuraj :)
ReplyDeleteI agree your solution will work for the incremental load, albeit, you would require to handle which record to insert and which to update, which is kind of similar to your another suggestion of using SCD.
Preference of the solution should depends on the performance and then simplicity. How about you share some statistics on the performance of different approaches so that everybody get benefited out of that!
Nice one really it helps to understand Incremental load.
ReplyDeleteThanks
Madhu
Thank u for posting the incremental load in detail..If you perform deletion also it may be helpful to differentiate easily incremental load with scd transformation because deletion is the main difference between the scd transformation and incremental.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThanks for the tutorial which makes it easy to understand the incremental load procedure. However in the tutorial you are incrementally loading a table, Can we do an incremental load of a database? Any tutorial or help will be appreciated.
ReplyDeleteThis comment has been removed by the author.
DeleteA database consists of many objects like stored procedures, functions, views and also tables. Loading is something that happens only with tables, as long as you have the appropriate schema in the destination, you can loop through source tables using INFORMATION_SCHEMA and place the single table load logic in the iteration. Stage some auditing once for a batch you prefer to keep track of load time.
DeleteThanks for the article
ReplyDeleteA nice article here, i think that people who have grown up with the idea of using computers are showing more responsibility towards writing posts that are thoughtful, do not have grammar mistakes and pertinent to the post..
ReplyDeleteMSBI Training in Chennai
Informatica Training in Chennai
Dataware Housing Training in Chennai
Nice to know about SSIS package.
ReplyDeletemicrosoft access to sql server migration
ms access to sql server migration tool
ms access migration
access to sql server migration tool
I truly feel this blog has provided details on a very interesting topic related to SSIS operations.I truly feel functions such as SSIS insert,update and delete are very crucial for database connectivity.
ReplyDeletehttps://zappysys.com/products/ssis-powerpack/ssis-upsert-destination/
Perfectly formatted and accessible guide, pleasure to read, thanks a lot -
ReplyDeleteThank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.
ReplyDeleteBig Data Consulting Services
Data Lake Solutions
Advanced Analytics Services
Full Stack Development Solutions
Nice Article.. It will helps a lot while handling large datasets
ReplyDeleteThank to the blogger for sharing such informative post, Not sure if you are interested in 3rd party product but here is the solution.
ReplyDeleteLink here" SSIS Update or Insert
thanks for sharing
ReplyDeletemsbi course
msbi training
It 's an amazing and awesome blog. Thanks for sharing
ReplyDeleteMsbi Online Training in Hyderabad
Msbi Online Training in India