XSLT to process huge XML files (Almost 5 GB)

2

I am trying to find a consistent solution using XSLT to transform huge XML files (Almost 5 GB)

Hier is what I have tried until now:

  1. Using the MSXML Parser 4.0 (SP3) from the command line:

>msxsl.exe myfile.xml mysheet.xslt -o output.xml

This runs out of memory (Code: 0x8007000e) with files bigger than 800MB.

  1. Using Mozilla Firefox or IE applying XSLT through a Processing Instruction:

<?xml version='1.0' encoding='UTF-8'?>

<?xml-stylesheet href="mysheet.xslt" type="text/xsl" ?>

<root>...

The browser crashes after a couple of minutes.

  1. Trying to write my own XML-Reader in PHP (Ver 5.4.22) on Windows and selecting the elements I need with XPath

<?php

ini_set('max_execution_time', 0);

ini_set('memory_limit', '-1');

$xml = simplexml_load_file('myfile.xml');

foreach($xml->xpath('/root/node/atribute[@id="value"]') as $result){

...

... ...

}

... ... ...

The Apache server crashes.

Please tell about your experiences in this area... What about writing a class in Java?

P.S. I donĀ“t want to use software like XmlSplit or something!

java
php
xml
xslt
xpath

1 Answer

4

For a 5Gb source document you'll need a streaming processor, and that means XSLT 3.0, which currently has two implementations, Saxon-EE and Exselt. Of course, not all transformations are streamable (sorting is tricky, for example), but if you describe the transformation you want to perform, or give a non-streaming version of it, then I'm sure we can help you to turn into something that works under streaming.

answered on Stack Overflow Oct 29, 2015 by Michael Kay

User contributions licensed under CC BY-SA 3.0