I need to use some data from a xml-file which size is approx > 2 GB (you can have a look: https://leidata.gleif.org/api/v1/concatenated-files/lei2/20180128/zip)
I need the data in access and tried to read the file with the following vba-code:
Public Function ReadLei(strFile As String) As Long
Dim xmlLeiData As New MSXML2.DOMDocument
With xmlLeiData
.async = False
.preserveWhiteSpace = False
.validateOnParse = False
.resolveExternals = False
End With
If xmlLeiData.Load(strFile) = True Then
MsgBox "ok"
Else
MsgBox xmlLeiData.parseError
End If
ReadLei = 0
End Function
its end up with an 0x8007000E out of memory error.
Are there other ways to read and parse such large XML-files via vba/Access?
I don't know specifically what might work under VBA / Access, but building a DOM for this size input probably isn't viable.
The MSXML parser (which you are using) also has a SAX api, where the parser reads the input file and notifies the application of events like start tags, end tags, attributes, and text nodes. This may meet your needs, but the programming can be tricky.
Microsoft's .NET parser (System.Xml) also has a "pull" API, allowing the application to call a "nextEvent()" method provided by the parser, so you can read your way through the file in a structured way. Many people find this a bit easier to use than the SAX approach, though it's still very low-level coding.
A radically different approach would be to use streaming XSLT 3.0 (perhaps in a transformation to reduce the file to a manageable size, which you could then access using DOM in your accustomed way). For that you'll need the commercial version of Saxon. It will cost a bit more but save you time.
Update: you say in a comment that the file contains 1m records, and you only want to retain 4 or 5 columns. You can cut down the file like this in a streaming XSLT 3.0 transformation, where P, Q, R, and S are the wanted columns:
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:mode streamable="yes" on-no-match="deep-skip"/>
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="P|Q|R|S">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:transform>
Since you're not providing specifics, I can't give you specifics.
You can first transform the XML to only contain the data you want using an xslt file and the Application.TransformXML
method (documentation), and then import the XML using the Application.ImportXML
method (documentation).
Note that the maximum size of an Access database is 2GB. Importing large files can quickly exceed this limit.
Thanks for the (huge) sample-file. I have been processing xml files for more than 15 years now. I allways had doubts about how Ms Access would perform when nearing the GB-limits.
In my experience, and now confirmed, there is only one winner :
Open FileURL For Input As #FileNum
. To be combined with InputLine = Input(1000, #FileNum) ' read some 1.000 characters
. Basically, just treat the XML as a plain text-file.
The coding would be easier if Line Input
could be used in stead of Input
, but in your sample this is not the case. Your sample file uses vbLf
to mark the end of a line in the text, and Line Input
needs vbCrLf
to work properly.
I ended up with a small application that first scans a file for the different occuring tags. Afterwards, these tags can be assigned to several tasks as :
In a second full read all the values are assigned to their destination fields in the database.
I'm gonna try to clarify a bit by inserting some code (as of 02 Feb 2018 15h London time, I have to dash, I am gonna come back to it at a later point of time)
Option Compare Database
Option Explicit
Dim marrKnownTags() As String
Public Sub ReadFile2GB()
Dim FileNum As Integer
Dim InputLine As String
Call init_marrKnownTags
FileNum = FreeFile
Open "X:\20180128-gleif-concatenated-file-lei2.xml" For Input As #FileNum
Do While Not EOF(FileNum)
InputLine = Input(99000, #FileNum) ' read some 99.000 characters
Call processTemporaryBlock(InputLine)
...
Loop
Close #FileNum
End Sub
Public Function positionCrOfLf(PieceToScan As String) As Long
Dim Pos As Long
Pos = 0
If Pos = 0 Then
Pos = InStr(PieceToScan, vbCrLf)
End If
If Pos = 0 Then
Pos = InStr(PieceToScan, vbLf)
End If
If Pos = 0 Then
Pos = InStr(PieceToScan, vbCr)
End If
'Debug.Print "fie positionCrOfLf := " & Pos
positionCrOfLf = Pos
End Function
Private Sub init_marrKnownTags()
ReDim Preserve marrKnownTags(333)
marrKnownTags(1) = "<?xml version=" ' start of xml
marrKnownTags(10) = "<lei:LEIData" ' Table_01 Open
marrKnownTags(20) = "<lei:LEIHeader>" ' Table_02 Open
marrKnownTags(21) = "<lei:ContentDate>" ' field
marrKnownTags(22) = "<lei:FileContent>" ' field
marrKnownTags(23) = "<lei:RecordCount>" ' field
marrKnownTags(30) = "<lei:Extension>" ' Table_03 Open
marrKnownTags(40) = "<gleif:Sources>" ' Table_04 Open
marrKnownTags(41) = "<gleif:Source>" ' addnew record Table_04
marrKnownTags(42) = "<gleif:ContentDate>" ' field
marrKnownTags(43) = "<gleif:Originator>" ' field
marrKnownTags(44) = "<gleif:RecordCount>" ' field
marrKnownTags(45) = "</gleif:Source>" ' save this new record Table_04
marrKnownTags(46) = "</gleif:Sources>" ' Table_04 Close
marrKnownTags(31) = "</lei:Extension>" ' Table_03 Close
' ... some more child-tables in the future ??
marrKnownTags(129) = "</lei:Entity>" ' Table_12 Close ' close child table
marrKnownTags(140) = "<lei:Registration>" ' Table_14 Open
marrKnownTags(141) = "<lei:LastUpdateDate>" ' DO NOT SKIP field with "2017-11-30T15:06:27Z" =?= 2017-11-30 15:06:27
marrKnownTags(142) = "<lei:RegistrationStatus>" ' DO NOT SKIP field with "ISSUED"
marrKnownTags(149) = "</lei:Registration>" ' Table_14 Close
marrKnownTags(2) = "</lei:LEIRecord>" ' save this new record
marrKnownTags(2) = "</lei:LEIRecords>" ' Table_11 Close ' close child table
End Sub
Public Function processTemporaryBlock(ByVal TemporaryBlock As String)
Dim positionStart As Long, positionEnd As Long, positionLength As Long
Dim OneLine As String, searchTag As String
Dim indexArray As Long
Dim tagFoundYN As Boolean
positionStart = 1
positionEnd = positionCrOfLf(TemporaryBlock)
Do While positionEnd > 0
OneLine = trim(Mid(TemporaryBlock, positionStart, positionEnd - 1))
Debug.Print "OneLine := " & OneLine
tagFoundYN = False
For indexArray = LBound(marrKnownTags) To UBound(marrKnownTags)
searchTag = marrKnownTags(indexArray)
searchTag = Trim(searchTag)
If searchTag = "" Then
' skip
Else
If Left(OneLine, Len(searchTag)) = searchTag Then
' Call processTag(OneLine)
tagFoundYN = True
exit for
End If
End If
Next
positionStart = positionStart + positionEnd
positionEnd = positionCrOfLf(Mid(TemporaryBlock, positionStart))
Loop
End Sub
User contributions licensed under CC BY-SA 3.0