How to split a large dataset into multiple Excel spreadsheets using an SSIS package?

2

I'm facing a problem with an SSIS package.

  • A query is executed to obtain some data from the DataBase (SQL Server 2008) (Data Flow Task executed)
  • Export the data extracted to an Excel 97-2003 spreadsheet (.xls) using Excel Destination

As most of you know the xls files are limited per sheet to 65,536 rows by 256 columns. So when the query extracts more than the records limit (65,536), the Excel Destination Step fails.

I get the following error messages.

Error: 0xC0202009 at Calidad VIDA, Excel Destination [82]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.

Error: 0xC0209029 at Calidad VIDA, Excel Destination [82]: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "input "Excel Destination Input" (93)" failed because error code 0xC020907B occurred, and the error row disposition on "input "Excel Destination Input" (93)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure. Error: 0xC0047022 at Calidad VIDA, SSIS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Excel Destination" (82) failed with error code 0xC0209029 while processing input "Excel Destination Input" (93). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.

Error: 0xC02020C4 at Calidad VIDA, OLE DB Source [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.

Error: 0xC0047038 at Calidad VIDA, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "OLE DB Source" (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.

The file needs to be in that format, because the clients don't have newer versions. And they don't want to buy licenses. Does anyone know how to work around with this issue? I should use a Script task and make the excel by my own, or I should make a for each loop and create various excels woorkbooks?

c#
sql
sql-server-2008
excel
ssis
asked on Stack Overflow Jul 12, 2011 by Mr. • edited Oct 5, 2012 by (unknown user)

3 Answers

6

Here is one possible option that you can use to create Excel worksheets dynamically using the SSIS based on how many number of records you want to write per Excel sheet. This doesn't involve Script tasks. Following example describes how this can be achieved using Execute SQL Tasks, For Loop container and Data Flow Task. The example was created using SSIS 2008 R2.

Step-by-step process:

  1. In SQL Server database, run the scripts provided under SQL Scripts section. These scripts will create a table named dbo.SQLData and then will populate the table with multiplication data from 1 x 1 through 20 x 40, thereby creating 800 records. The script also creates a stored procedure named dbo.FetchData which will be used in the SSIS package.

  2. On the SSIS package, create 9 variables as shown in screenshot #1. Following steps describe how each of these variables are configured.

  3. Set the variable ExcelSheetMaxRows with value 80. This variable represents the number of rows to write per Excel sheet. You can set it to value of your choice. In your case, this would be 65,535 (you might want to leave 1 row for header column names).

  4. Set the variable SQLFetchTotalRows with value SELECT COUNT(Id) AS TotalRows FROM dbo.SQLData. This variable contains the query to fetch the total row count from the table.

  5. Select the variable StartIndex and choose Properties by pressing F4. Set the property EvaluateAsExpression to True and the property Expression to the value (@[User::Loop] * @[User::ExcelSheetMaxRows]) + 1. Refer screenshot #2.

  6. Select the variable EndIndex and choose Properties by pressing F4. Set the property EvaluateAsExpression to True and the property Expression to the value (@[User::Loop] + 1) * @[User::ExcelSheetMaxRows]. Refer screenshot #3.

  7. Select the variable ExcelSheetName and choose Properties by pressing F4. Set the property EvaluateAsExpression to True and the property Expression to the value "Sheet" + (DT_WSTR,12) (@[User::Loop] + 1). Refer screenshot #4.

  8. Select the variable SQLFetchData and choose Properties by pressing F4. Set the property EvaluateAsExpression to True and the property Expression to the value "EXEC dbo.FetchData " + (DT_WSTR, 15) @[User::StartIndex] + "," + (DT_WSTR, 15) @[User::EndIndex]. Refer screenshot #5.

  9. Select the variable ExcelTable and choose Properties by pressing F4. Set the property EvaluateAsExpression to True and the property Expression to the value provided under ExcelTable Variable Value section. Refer screenshot #6.

  10. On the SSIS package's Control Flow tab, place an Execute SQL Task and configure it as shown in screenshots #7 and #8. This task will fetch the record count.

  11. On the SSIS package's Control Flow tab, place a For Loop Container and configure it as shown in screenshot #9. Please note this is For Loop and not Foreach Loop. This loop will execute based on the number of records to display in each Excel sheet in conjunction with the total number of records found in the table.

  12. Create an Excel spreadsheet of Excel 97-2003 format containing .xls extension as shown in screenshot #10. I created the file in **C:\temp**

  13. On the SSIS package's connection manager, create an OLE DB connection named SQLServer pointing to SQL Server and an Excel connection named Excel pointing to the newly created Excel file.

  14. Click on the Excel connection and select Properties. Changes the property DelayValidation from False to True so that when we switch to using variable for sheet creation in Data Flow Task, we won't get any error messages. Refer screenshot #11.

  15. Inside the For Loop container, place an Execute SQL Task and configure it as shown in screenshot #12. This task will create Excel worksheets based on the requirements.

  16. Inside the For Loop container, place a Data flow task. Once the tasks are configured, the Control Flow tab should look like as shown in screenshot #13.

  17. Inside the Data Flow Task, place an OLE DB Source to read data from SQL Server using the stored procedure. Configure the OLE DB Source as shown in screenshots #14 and #15.

  18. Inside the Data Flow Task, place an Excel Destination to insert the data into the Excel sheets. Configure the Excel destination as shown in screenshots #16 and #17.

  19. Once the Data Flow Task is configured, it should look like as shown in screenshot #18.

  20. Delete the Excel file that was created in step 12 because the package will automatically create the file when executed. If not deleted, the package will throw the exception that Sheet1 already exists. This example uses the path C:\temp\ and screenshot #19 shows there are no files in that path.

  21. Screenshots #20 and #21 show the package execution inside Control Flow and Data Flow tasks.

  22. Screenshot #22 shows that file ExcelData.xls has been created in the path C:\temp. Remember, earlier this path was empty. Since we had 800 rows in the table and we set the package variable ExcelSheetMaxRows to create 80 rows per sheet. Hence, the Excel file has 10 sheets. Refer screenshot #23.

  23. NOTE: One thing that I haven't done in this example is to check if the file ExcelData.xls already exists in the path C:\temp. If it exists, then the file should be deleted before executing the tasks. This can be achieved by creating a variable that holds the Excel file path and use a File System Task to delete the file before the first Execute SQL Task is executed.

Hope that helps.

ExcelTable Variable Value:

"CREATE TABLE `" + @[User::ExcelSheetName] + "`(`Id` Long, `Number1` Long, `Number2` Long, `Value` Long)"

SQL Scripts:

--Create table

CREATE TABLE [dbo].[SQLData](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Number1] [int] NOT NULL,
    [Number2] [int] NOT NULL,
    [Value] [int] NOT NULL,
CONSTRAINT [PK_Multiplication] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

--Populate table with data

SET NOCOUNT ON

DECLARE @OuterLoop INT
DECLARE @InnerLoop INT
SELECT @OuterLoop = 1

WHILE @OuterLoop <= 20 BEGIN

    SELECT @InnerLoop = 1   
    WHILE @InnerLoop <= 40 BEGIN

            INSERT INTO dbo.SQLData (Number1, Number2, Value)
            VALUES (@OuterLoop, @InnerLoop, @OuterLoop * @InnerLoop)

            SET @InnerLoop = @InnerLoop + 1
    END

    SET @OuterLoop = @OuterLoop + 1
END

SET NOCOUNT OFF

--Create stored procedure

CREATE PROCEDURE [dbo].[FetchData]
(
        @StartIndex INT
    ,   @EndIndex   INT
)
AS
BEGIN

    SELECT  Id
        ,   Number1
        ,   Number2
        ,   Value
    FROM    (
                SELECT  RANK() OVER(ORDER BY Id) AS RowNumber
                    ,   Id
                    ,   Number1
                    ,   Number2
                    ,   Value 
                FROM    dbo.SQLData
            ) T1
    WHERE   RowNumber BETWEEN @StartIndex AND @EndIndex
END
GO

Screenshot #1:

1

Screenshot #2:

2

Screenshot #3:

3

Screenshot #4:

4

Screenshot #5:

5

Screenshot #6:

6

Screenshot #7:

7

Screenshot #8:

8

Screenshot #9:

9

Screenshot #10:

10

Screenshot #11:

11

Screenshot #12:

12

Screenshot #13:

13

Screenshot #14:

14

Screenshot #15:

15

Screenshot #16:

16

Screenshot #17:

17

Screenshot #18:

18

Screenshot #19:

19

Screenshot #20:

20

Screenshot #21:

21

Screenshot #22:

22

Screenshot #23:

23

answered on Stack Overflow Jul 13, 2011 by (unknown user) • edited Jul 16, 2011 by (unknown user)
2

The stock excel rendering option allows for options to paginate onto separate tabs. If you force page breaks after the appropriate number of rows, you get a new tab for each "page" in your output. I don't have the settings I have used in the past for this, but I can look it up tomorrow if you need me to.

answered on Stack Overflow Jul 12, 2011 by William Salzman
0

Also worked for me was to use the SSIS package to export to a CSV file and then manually import the data into Excel. Note that this is not the same as "Open" in Excel, because that will also just stop at 65536 rows. Create a new xlsx file and click on "Data" -> "From Text". It will import and show all rows. Tested with 750,000 rows.

However, not sure if the csv -> xlsx conversion is easily scripted within an SSIS package. Most likely via a Script task using the Excel COM object.

answered on Stack Overflow Feb 17, 2012 by mdisibio

User contributions licensed under CC BY-SA 3.0