Need to import Excel files to a SQL table. Everyday one file is created at specified location in the source system. We need to copy that file from Source System to Local system and then load to SQL Table.
Conditions:
1. Each file should be loaded only once. Everynight Job should be executed to load data into reporting Data mart.
2. Source system will maintain all the history files so files at souce should not be deleted.
3. If Job failed due to some reason (schema changes, server down, connection issues etc.), it should load all the files from last run date to current date in next successul run. For example, job didn't run last one week then whenever job runs next time successfully, it should load current file as well as all the files of last week which were missing.
4. All the source files will have the same structure (Schema)
5. Nomenclature - Each file will have name Transaction followed by current date in YYYY-MM-DD format. For example, if a file was created on 01-Aug-2010 then its name should be Transaction2010-08-01.
Solution
I will take advantage of the ForEach Loop Container. I'll create a new SSIS package to solve the above problem.
STEP 1:
Create following tables in your local database (destination database).
CREATE TABLE [TransactionHistory](
[TransactionID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[ReferenceOrderID] [int] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionType] [nchar](1) NOT NULL,
[Quantity] [int] NOT NULL,
[ActualCost] [money] NOT NULL,
CONSTRAINT [PK_TransactionHistory_TransactionID]
PRIMARY KEY CLUSTERED ([TransactionID] ASC)
) ON [PRIMARY]
GO
CREATE TABLE [ExcelFilesLog](
[FileName] [varchar](100),
[FileCreatedDate] [datetime],
[FileLoadDate] [datetime],
[RecordCount] [int],
) ON [PRIMARY]
GO
STEP 2: Create a new SSIS package and rename it to LoadDynamicExcelFiles.dtsx
STEP 3: Add following package variables:
VariableName | Description | Examle |
LastRunDate | To store last run date | 2010-07-30 |
CurrentDate | To hold running date | 2010-08-10 |
Directory Source | To store source directory path | D:\SSIS\Hari\SourceFiles |
Directory Destination | To store local directory path | D:\SSIS\Hari\DestinationFiles |
CurrentFile Name | To store current file name | D:\SSIS\Hari\SourceFiles\ Transaction2010-08-01 |
Create one OLE DB connection (I will use (local).TestHN connection manager) for local database where you want to load excel files data. Create one Excel Connection Manager for excel files located in D:\SSIS\Hari\DestinationFiles. At least one file should be there to create excel connection manager.
Click on Excel Connection Manager --> go to Properties window --> Select Expression and set ExcelFilePath with package variable User::CurrentFileName as shown below:
For this article, I'll use two directories - one for sources files and one for destination files. Location of these files are given below:
Source Files: D:\SSIS\Hari\SourceFilesDestination Files: D:\SSIS\Hari\DestinationFiles
I have created few excel files from Production.TransactionHistory table of AdventureWorks2008R2 database as shown below:
I have created files from 2010-07-20 to 2010-08-01. After executing the package first time, I will create files from 2010-08-02 to 2010-08-09 to test the package.
STEP 4:
Drag and drop File System Task and double click to open File System Task Editor. Enter FST - Delete destination directory content in name and select Delete directory content as Operation. Set IsSourcePathVariable to True and select SourceVariable as User::DirectoryDestination. Finally click on OK and save changes.
Drag and drop Execute SQL Task and double click to open Execute SQL Task Editor. Enter Execute SQL Task - Get LastRunDate in Name, select Single row as Result Set, Conection Type as OLE DB and Connection as (local).TestHN and SQLSourceType as Direct input. Enter below query in SQLStatement:
SELECT ISNULL(MAX([FileCreatedDate]),'2010-01-01') AS LastRunDate
FROM [dbo].[ExcelFilesLog] (NOLOCK)
FROM [dbo].[ExcelFilesLog] (NOLOCK)
In Result Set tab, set Result Name 0 to variable User::LastRunDate.
Finally click OK and save changes.
Drag and drop Script Task and double click to open Script Task Editor. Select User::DirectoryDestination, User::DirectorySource, User::LastRunDate in ReadOnlyVariables. Click on Edit Script... and paste below code:
using System.IO;
public void Main()
{
try
{
string DirectorySource = Dts.Variables["User::DirectorySource"].Value.ToString();
string DirectoryDestination = Dts.Variables["User::DirectoryDestination"].Value.ToString();
DateTime LastRunDate = (DateTime)Dts.Variables["User::LastRunDate"].Value;
string fileName,fileExtension;
string[] files = Directory.GetFiles(DirectorySource);
//Copy source files to destination
foreach (string f in files)
{
fileName = Path.GetFileName(f);
fileExtension = Path.GetExtension(f);
DateTime CurrentDate = DateTime.Parse(fileName.Substring("Transaction".Length, 10));
if ((DateTime.Compare(CurrentDate,LastRunDate)>0) && (fileExtension == ".xlsx"))
{
//MessageBox.Show(fileName.ToString());
File.Copy(f, Path.Combine(DirectoryDestination, fileName), true);
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception ex)
{
Dts.Log(ex.Message, 0, null);
Dts.TaskResult = (int)ScriptResults.Failure;
}
}
Drag and drop Foreach Loop container. Select Foreach File Enumerator as Enumerator, enter D:\SSIS\Hari\DestinationFiles in Folder path and *.xlsx in files textbox. Select Fully qualified as Retrieve file name. In Variable Mappings, Select User::CurrentFileName for Index 0 to store current file name for each iteration.
Now drag and drop Data Flow Task inside Foreach loop container. Use Excel Source reader to read excel files from destination directory. Use Excel Connection Manager as connection manager for excel files. In connection properties, Select @[User::CurrentFileName] as ExcelFilePath. Use Data Conversion, if required. Use Row Count Task to count number of rows in data flow and store it in User::RecordCount variable. Use OLE DB Destination to load data into SQL table.
Drag and drop Execute SQL Task inside Foreach loop container to log information about current file.
Double click on Execute SQL Task to open Execute SQL Task Editor, enter Execute SQL Task - Insert info into Log table as Name, None as Result Set, OLE DB as connection type, (local).TestHN as connection, Direct input as SQLSourceType and below query as SQLStatement.
DECLARE
@FileName varchar(500)
,@FilePath varchar(500)
,@DestinationPath varchar(500)
SET @DestinationPath = ?
SET @FilePath = ?
SET @FileName = REPLACE(@FilePath,@DestinationPath + '\','')
INSERT INTO [ExcelFilesLog]
(
[FileName]
,[FileCreatedDate]
,[FileLoadDate]
,[RecordCount]
)
SELECT
@FileName [FileName]
,CAST(SUBSTRING(@FileName,12,10) as datetime) [FileCreatedDate]
,GETDATE() [FileLoadDate]
,?
In Parameter mapping, map User::DirectoryDestination, User::CurrentFileName, and User::RecordCount with parameter 0,1,2 respectively as shown below.
Finally click OK and save changes.
Below is the final layout of our package:
STEP 6:
We are done with the package development. To execute the package, go to SSIS package location, (in this example, it is D:\SSIS\Hari\Sample\SSIS-Sample1), right click on LoadDynamicExcelFiles.dtsx --> Open with --> SQL Server 2008 Integration Services Package Execution Utility. This will open Execute Package Utility. Click on Execute button to run ssis package. Now you can see the progress of package execution in Package Execution Progress window.
We are done with the package development. To execute the package, go to SSIS package location, (in this example, it is D:\SSIS\Hari\Sample\SSIS-Sample1), right click on LoadDynamicExcelFiles.dtsx --> Open with --> SQL Server 2008 Integration Services Package Execution Utility. This will open Execute Package Utility. Click on Execute button to run ssis package. Now you can see the progress of package execution in Package Execution Progress window.
Now you can check ExcelFileLog table to cross check the result of package.
Now I'll add few more excel files in source location - from
Transaction2010-08-02 to Transaction2010-08-09 as shown below:
When I execute this package next time, it will load only new files. Yon can check ExcelFileLog for each iteration:
Hey i have tried the process ,on executing the script task "load excel sheet to table " I have been receiving the status as Failure. Please find the code below :
ReplyDeletePublic Class ScriptMain
Enum ScriptResults
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
End Enum
Public Sub Main()
'
' Add your code here
'
' Dts.TaskResult = Dts.Results.Success
Try
Dim DirectorySource As String
Dim DirectoryDestination As String
'Dim ScriptResults As ScriptTask
DirectorySource = Dts.Variables("User::DirectorySource").Value.ToString()
DirectoryDestination = Dts.Variables("User::DirectoryDestination").Value.ToString()
Dim LastRunDate As DateTime = CType(Dts.Variables("User::LastRunDate").Value, DateTime)
Dim fileName As String, fileExtension As String
Dim files() As String = Directory.GetFiles(DirectorySource)
'Copy source files to destination
Dim CurrentDate As Date
Dim f As String
For Each f In files
fileName = Path.GetFileName(f)
fileExtension = Path.GetExtension(f)
CurrentDate = DateTime.Parse(fileName.Substring("Transaction".Length, 10))
If (DateTime.Compare(CurrentDate, LastRunDate) > 0) Then
'MessageBox.Show(fileName.ToString());
File.Copy(f, Path.Combine(DirectoryDestination, fileName), True)
End If
Next
Dts.TaskResult = CType(ScriptResults.Success, Integer)
Catch ex As Exception
Dts.Log(ex.Message, 0, Nothing)
Dts.TaskResult = CType(ScriptResults.Failure, Integer)
End Try
End Sub
End Class
Thank you for your article. It is really intersting. Unfortunately if you are a NewB there are couple Steps that aren't detailed. I am getting the following error when tryied to folow your steps project step by step. could you help please?
ReplyDeleteError 1 Validation error. FST - Delete destination directory content File System Task: Variable "DirectoryDestination" is used as a source or destination and is empty.
Error 2 Validation error. Data Flow Task OLE DB Destination [26]: A connection manager has not been assigned to the runtime connection "OleDbConnection" (34).
Hi Ronaldomar,
ReplyDeleteError1: Did you set variable DirectoryDestination with default value? In my exapmle it is "D:\SSIS\Hari\DestinationFiles". However you can configure this value through package configuration.
Error2: It seems that you haven't properly configured your connection Manger which is used for OLE DB Destination.
it is an old but no doubt it is quite usefull to learn SSIS but unfortunately it is missing many many steps, and without those one could get easily lost and purpose of education dies.
ReplyDeleteI wish you could have explained in segments and without skipping some fundamental tasks.
You efforts are appreciated very highly but efforts without a fruitful results is not good
Thanks ZeeAay for the great feedback. Albeit, I would appreciate if you mention "MANY MANY step" which are missing...
DeleteI wrote this article couple of years back by following actual code I wrote for some real project. I tried to include all the steps I could remember but it would be great if you can mention what steps are actually missing.
Please provide the all steps will be appreciated.
ReplyDeleteExcellent article and with lots of information. I really learned a lot here. Do share more like this
ReplyDeleteWeb Development Training In Chennai
Web Development Course In Chennai
Web Development Online Training
Awesome article! You are providing us very valid information. This is worth reading. Keep sharing more such articles.
ReplyDeletehow Data Science Changing the World Today
amazing Post.
ReplyDeleteartificial intelligence Classes in Pune