I am automating some things at work so I decided to encompass it in a SSIS package. I have been working on this for months and one of the problems I faced at the beginning resurfaced.
I receive a report through email, which is downloaded renamed and placed into L:\MACROS\SSIS\Input (this is done through a C# application I created).
I then import the data from that report into SQL.
The problem exists here, as I try to get the data from the xls file a specific column has 1 of 2 behaviours. If the top row of data is only numeric, it will automatically assign this as numeric and only import numeric values, anything non-numeric is transformed into null.
This column is the invoice number which usually is numeric, but there is a world region where they will be non-numeric (i.e: "MAGI:1326564" I get this error message when I open my data flow object:
TITLE: Microsoft Visual Studio
The metadata of the following output columns does not match the metadata of the external columns with which the output columns are associated:
Output "Excel Source Output": "F11"
Do you want to replace the metadata of the output columns with the metadata of the external columns?
------------------------------ BUTTONS:
&Yes &No
I can either get the numerics or the non-numeric values.
Now, as I wanted a permanent fix I thought about just using C# to create a separate column for non-numeric and delete them from the original column.
That way I have a reusable method of fixing the above issue.
try
{
//Start Excel and get Application object.
oXL = new Microsoft.Office.Interop.Excel.Application();
oXL.Visible = false;
oWB = (Microsoft.Office.Interop.Excel._Workbook)(oXL.Workbooks.Open(@"L:\MACROS\SSIS\Input\A2_POST_ADVICE_FOR_DUTY_LINES.xls"));
oSheet = (Microsoft.Office.Interop.Excel._Worksheet)oWB.ActiveSheet;
/* int nInLastRow = oSheet.Cells.Find("*", System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value, Microsoft.Office.Interop.Excel.XlSearchOrder.xlByRows, Microsoft.Office.Interop.Excel.XlSearchDirection.xlPrevious, false, System.Reflection.Missing.Value, System.Reflection.Missing.Value).Row;
*/
var j = 7;
var cellValue = (string)(oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value;
// while (j < 20)/*nInLastRow)*/
// {
i = 0;
foreach (char value in cellValue)
{
bool digit = char.IsDigit(value);
if (digit == true)
{
i = i + 1;
}
else { i = i + 0; }
}
if (i > 1)
{
oSheet.Cells[j, 22] = cellValue;
//oSheet.Cells[j, 11].Clear();
}
// Close the workbook, tell it to save and give the path.
// j = j + 1;
// }
oXL.DisplayAlerts = false;
oWB.SaveAs(@"L:\MACROS\SSIS\Input\A2_POST_ADVICE_FOR_DUTY_LINES.xls", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlNoChange, Type.Missing, Type.Missing, Type.Missing,Type.Missing, Type.Missing);
oWB.Close();
// Now quit the application.
oXL.Quit();
// Call the garbage collector to collect and wait for finalizers to finish.
GC.Collect();
GC.WaitForPendingFinalizers();
// Release the COM objects that have been instantiated.
Marshal.FinalReleaseComObject(oWB);
Marshal.FinalReleaseComObject(oSheet);
// Marshal.FinalReleaseComObject(oRng);
Marshal.FinalReleaseComObject(oXL);
}
catch (Exception theException)
{
String errorMessage;
errorMessage = "Error: ";
errorMessage = String.Concat(errorMessage, theException.Message);
errorMessage = String.Concat(errorMessage, " Line: ");
errorMessage = String.Concat(errorMessage, theException.Source);
MessageBox.Show(errorMessage, "Error");
}
I keep getting an error message while running C#
"Cannot convert type double to string.
The code was working before implementing the loop (for 2 tries), after implemented the loop it wouldn't work anymore so I commented out the loop but I still get the same error.
I have also changed:
var cellValue = (string)(oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value;
to
var cellValue = (oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value.ToString();
with this change it worked for 2 tests and wouldn't work anymore.
If I change it to:
string cellValue = "MA1352564";
it will execute what I wanted it to do so I have narrowed it down to the issue being trying to convert the value of the cell to a string so that it can there check if the characters in the string are digits or not.
I am looking for either a different solution to my import problem or any ideas on how to fix the C# section of code :)
EDIT: I forgot to mention that if I enable data viewer in the flow, the data coming out of excel is already stripped off the non-numeric data....
EDIT2:
After using the suggested options I get this error:
Error: 0xC0202009 at DataInputUni, Excel Source [12]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21. An OLE DB record is available. Source: "Microsoft JET Database Engine" Hresult: 0x80040E21 Description: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.". Error: 0xC0208265 at DataInputUni, Excel Source [12]: Failed to retrieve long data for column "F11". Error: 0xC020901C at DataInputUni, Excel Source [12]: There was an error with Excel Source.Outputs[Excel Source Output].Columns[F11] on Excel Source.Outputs[Excel Source Output]. The column status returned was: "DBSTATUS_UNAVAILABLE". Error: 0xC0209029 at DataInputUni, Excel Source [12]: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "Excel Source.Outputs[Excel Source Output].Columns[F11]" failed because error code 0xC0209071 occurred, and the error row disposition on "Excel Source.Outputs[Excel Source Output].Columns[F11]" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure. Error: 0xC0047038 at DataInputUni, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on Excel Source returned error code 0xC0209029. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
It sounds like the Excel driver isn't reading enough data when guessing the datatype. In addition to setting ;Extended Properties="IMEX=1"
in the connection string as per the comments, set the TypeGuessRows registry key to 0 according to which version of office, probably located at one of the following keys:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\
OFFICE NUMERICAL VERSION\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\
OFFICE NUMERICAL VERSION\Access Connectivity Engine\Engines\Excel\TypeGuessRows
Setting TypeGuessRows to 0 causes the entire column to be scanned when guessing the datatype. Setting IMEX=1 causes data to be returned as text (this can be altered in the registry) when mixed values are encountered. Omitting IMEX=1 causes data that does not match the guessed datatype to be returned as null.. IMEX is thus less important than TypeGuessRows, as setting it can only make a reasonable difference if enough variety is encountered in the first 8 rows (default scan) for columns that exhibit variety
http://microsoft-ssis.blogspot.com/2011/06/mixed-data-types-in-excel-column.html
Thanks to Caius Jard for his answer. I found a solution for my problem, I tried changing the output file format of the report to CSV but this made it worse lol. with CSV it simply would not scan the cells at all and assign everything as string which caused issues with importing. I then tried using .xlsx (2007 excel) which meant a new connection manager and got this as the connection string:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=L:\MACROS\SSIS\Input\A2_POST_TEST20190103214110525.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=NO";
Instead of adding again what Caius suggested I tried changing it to this:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=L:\MACROS\SSIS\Input\A2_POST_TEST20190103214110525.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=NO;IMEX=1";
this fixed my problem!
User contributions licensed under CC BY-SA 3.0