Splitting Records from pipe delimited Flat File using SSIS

1

I have a pipe delimited flat file with about 25,000 rows, but some rows have spaces, while others have several occurrences of the heading and subheading. I would like to keep only one heading row and omit all other heading rows, subheading rows, and rows with white spaces. I used the C# script at end of this post using StreamReader and StreamWriter to do this, but although the script worked OK, it took over 9 hours to run. I recently started using SSIS, and now looking at using an SSIS Conditional Split Transformation.

My data looks something like this:

*[white space]* Business Unit: 099 - HAA/DEPT OF SSSSSS SSSSSSSS
*[white space]* Empl Id  |  Employee Name  |  Dept Id  |  Department  |  EE Home Phone  |  Emergency Contact Name  |  Primary  |  Telephone  |  Relationship 
*[white space]*  0144111 | Adams Cdddddddd | 0990101 | Executive/Director-NM | *********** | *****NO CONTACT***** |  |  |   1555444 | Bookk Derkkk Yeeee | 0990101 | Director/Manager-NM |
*********** | AAALL SELLELL | Y | 646/711-9999| Parent 1555444 | Bookk Derkkk Yeeee | 0990101 | Director/Manager-NM | *********** | YYYXXX DeVaaa | N | 212/344-2222| Oth Relat 1555444 | Bookk Derkkk Yeeee | 0990101 | Director/Manager-NM | *********** | SSLL Wildddd | N | 917/255-5555| Oth Relat 1555444 | Bookk Derkkk Yeeee | 0990101 | Director/Manager-NM | *********** | CCLL A. Sree | N | 917/666-3333| Friend
*[white space]*  Business Unit: 099 - HAA/DEPT OF SSSSSS SSSSSSSS
*[white space]* Empl Id  |  Employee Name  |  Dept Id  |  Department  |  EE Home Phone  |  Emergency Contact Name  |  Primary  |  Telephone  |  Relationship 
*[white space]*  0144111 | Adams Cdddddddd | 0990101 | Executive/Director-NM | *********** | *****NO CONTACT***** |  |  |   1555444 | Bookk Derkkk Yeeee | 0990101 | Director/Manager-NM |
*********** | AAALL SELLELL | Y | 646/711-9999| Parent 1555444 | Bookk Derkkk Yeeee | 0990101 | Director/Manager-NM | *********** | YYYXXX DeVaaa | N | 212/344-2222| Oth Relat 1555444 | Bookk Derkkk Yeeee | 0990101 | Director/Manager-NM | *********** | SSLL Wildddd | N | 917/255-5555| Oth Relat 1555444 | Bookk Derkkk Yeeee | 0990101 | Director/Manager-NM | *********** | CCLL A. Sree | N | 917/666-3333| Friend

The SSIS Data Flow contains the following: 1) A Flat File Source with similar data as shown above. There are 9 columns. 2) A Conditional Split with the following criteria that splits into Multicast, Multicast1 and Multicast2:

i) RowsToOmit LTRIM(RTRIM([Empl Id ])) == "" ii) SecondTextToOmit LTRIM(RTRIM([Empl Id ])) == "Business Unit: 099 - HAA/DEPT OF SSSSSS SSSSSSSS"

iii) The good rows go to: GoodRows. Ultimately, I also want to omit all other occurrences of the following Header, but keep only the first occurrence of it: "Empl Id | Employee Name | Dept Id | Department | EE Home Phone | Emergency Contact Name | Primary | Telephone | Relationship"

I prepared the SSIS package, but when I run it I got the following errors:

Error: 0xC02020A1 at HAA Conditional Split Transformation, Flat File Source [1]: Data conversion failed. The data conversion for column "Empl Id " returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.". Error: 0xC020902A at HAA Conditional Split Transformation, Flat File Source [1]: The "output column "Empl Id " (63)" failed because truncation occurred, and the truncation row disposition on "output column "Empl Id " (63)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component. Error: 0xC0202092 at HAA Conditional Split Transformation, Flat File Source [1]: An error occurred while processing file "Z:\Emergency Contact Report\TEST\PER004-069-TEST.txt" on data row 251. Error: 0xC0047038 at HAA Conditional Split Transformation, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Flat File Source" (1) returned error code 0xC0202092. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

Why I am getting this error? Ultimately, I want to keep only one heading row and omit all other heading rows, subheading rows, and rows with white spaces. Can you also help me determine the best and most efficient method of getting this done?

C# Script:

using (StreamReader sr = new StreamReader(sourcePath))
{
    while ((Line = sr.ReadLine()) != null)
    {  
        // Write 1st occurance of Heading
        if (Line.Trim() == headerText && headerCount == 0)
        {
            outputText = outputText + Line + Environment.NewLine;
            headerCount++;
        }
        else
            //store text in variables to do checks all in same if statement
            if (Line.Trim() != "" && Line.Trim() != headerText && Line != headerText && Line != secondTextToOmit && Line != thirdTextToOmit)
            {
                outputText = outputText + Line + Environment.NewLine;
            }
        using (StreamWriter writer = new StreamWriter(destinationPath))
        {
            //write the string using filtered text
            writer.WriteLine(outputText);
        }
    }
}
file
ssis
split
record
flat
asked on Stack Overflow Dec 14, 2018 by Sast77 • edited Dec 15, 2018 by wp78de

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0