An XSL file that processes several XML files

0

I need an XSL file that reads several XML files and applies the same transformation to each. So far I've got this (sanitized version):

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>

<xsl:template match="document('first_xml_file.xml')">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="//outer_tag/inner_tag">
  ...stuff to appear in output for each file...
  <xsl:for-each select="outer_tag/inner_tag">
    ...stuff to appear in output for each inner_tag...
  </xsl:for-each>
</xsl:template>

</xsl:stylesheet> 

The strategy being that once it works I can add more <template> blocks to process more files.

Because a transformation tool always wants an XML file to work on, I created an empty XML file:

<?xml version="1.0"?>
<nothing/>

Then I ran Microsoft's transformation tool and got:

C:\...>msxsl empty.xml my_style_sheet.xsl

Error occurred while compiling stylesheet 'my_style_sheet.xsl'.

Code:   0x80004005
NodeTest expected here.

-->document<--('first_xml_file.xml')
C:...>

I looked this up and found that the only functions allowed in a pattern are id() and key(). That precludes the use of document() -- and makes it impossible to write an XSL file that reads several XML files, at least the way I want to do it. What is the solution here?

xml
xslt

1 Answer

1

The usual (at least in XSLT 1) and working approach is to match="/*" on your original input and then to use either <xsl:apply-templates select="document('doc1.xml') | document('doc2.xml') | document('doc3.xml')"/> or to have the primary input as a source with the secondary XML file urls where you would then use e.g. <xsl:apply-templates select="document(files/file)"/> with e.g. <files><file>doc1.xml</file><file>doc2.xml</file><file>doc3.xml</file></files> as the contents of the primary input.

Instead of matching on any document node you would then write templates matching the elements in those referenced files. The only issue is that you might need to use modes to make sure you can distinguish the primary document's root node or root element and the secondary document roots or elements. That depends on the structure of the documents and the organization of your templates.

In general your both match patterns seems rather unusual, the match="//outer_tag/inner_tag" could as well be shortened to match="outer_tag/inner_tag" without making any semantic difference.

And the match="document('first_xml_file.xml')" might be written as match="document-node()[. is document('first_xml_file.xml')]" if your aim is really to identify that particular document. That syntax is XSLT 2 though anyway, so it doesn't help with MSXML.

XSLT 2/3 are more flexible and powerful, both in terms of the match pattern possibilities as well as by the use of functions like collection and/or uri-collection. XSLT 2 or 3 is available on Windows from the command line using the Java or .NET version of Saxon 9 HE.

answered on Stack Overflow Sep 6, 2019 by Martin Honnen

User contributions licensed under CC BY-SA 3.0