Adding a DOCTYPE declaration on XSL output
In a posting a few weeks ago I discussed how to ignore the DOCTYPE declaration when processing XML through XSL. What I left unaddressed was how to add the DOCTYPE declaration back to the files. Several people have told me they’re tired of waiting for the other shoe to drop, so here’s how to add a DOCTYPE declaration.
First off: the easy solution. If the documents you are transforming always use the same DOCTYPE, you can use the doctype-public and doctype-system attributes in the <xsl:output> directive. When you specify these attributes, XSL inserts the DOCTYPE automatically.
However, if the DOCTYPE varies from file to file, you’ll have to insert the DOCTYPE declaration from your XSL stylesheet. In DITA files (and in many other XML architectures), the DOCTYPE is directly related to the root element of the document being processed. This means you can detect the name of the root element and use standard XSL to insert a new DOCTYPE declaration.
Before you charge ahead and drop a DOCTYPE declaration into your files, understand that the DOCTYPE declaration is not valid XML. If you try to emit it literally, your XSL processor will complain. Instead, you’ll have to:
- Use entities for the less-than (“<” – “<”) and greater-than (“>” – “>”) signs, and
- Disable output escaping so that the entities are actually emitted as less-than or greater-than signs (output escaping will convert them back to entities, which is precisely what you don’t want).
There are at least two possible approaches for adding DOCTYPE to your documents: use an <xsl:choose> statement to select a DOCTYPE, or construct the DOCTYPE using the XSL concat() function.
To insert the DOCTYPE declaration with an <xsl:choose> statement, use the document’s root element to select which DOCTYPE declaration to insert. Note that the entities “>” and “<” aren’t HTML errors in this post, they are what you need to use. Also note that the DOCTYPE statement text in this template is left-aligned so that the output DOCTYPE declarations will be left aligned. Most parsers seem to tolerate whitespace before the DOCTYPE declaration, but I prefer to err on the side of caution:
<xsl:template match="/">
<xsl:choose>
<xsl:when test="name(node()[1]) = 'topic'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'concept'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'task'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'reference'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "reference.dtd">
</xsl:text>
</xsl:when>
</xsl:choose>
<xsl:apply-templates select="node()"/>
</xsl:template>
The preceding example contains statements for the topic, concept, task, and reference topic types; if you use other topic types, you’ll need to add additional statements. Rather than write a statement for each DOCTYPE, a more general approach is to process the name of the root element and construct the DOCTYPE declaration using the XSL concat() function.
<xsl:variable name="ALPHA_UC" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="ALPHA_LC" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="NEWLINE" select="'&#x0A;'"/>
<xsl:template match="/">
<xsl:call-template name="add-doctype">
<xsl:with-param name="root" select="name(node()[1])"/>
</xsl:call-template>
<xsl:apply-templates select="node()"/>
</xsl:template>
<span style="color: green;"><-- Create a doctype based on the root element --></span>
<xsl:template name="add-doctype">
<xsl:param name="root"/>
<span style="color: green;"><-- Create an init-cap version of the root element name. --></span>
<xsl:variable name="initcap_root">
<xsl:value-of
select="concat(translate(substring($root,1,1),$ALPHA_LC,$ALPHA_UC),
translate(substring($root,2 ),$ALPHA_UC,$ALPHA_LC))"
/>
</xsl:variable>
<span style="color: green;"><-- Build the DOCTYPE by concatenating pieces.</span>
<span style="color: green;">Note that XSL syntax requires you to use the &quot; entities for</span>
<span style="color: green;">quotation marks ("). --></span>
<xsl:variable name="doctype"
select="concat('!DOCTYPE ',
$root,
' PUBLIC &quot;-//OASIS//DTD DITA ',
$initcap_root,
'//EN&quot; &quot;',
$root,
'.dtd&quot;') "/>
<xsl:value-of select="$NEWLINE"/>
<span style="color: green;"><-- Output the DOCTYPE surrounded by < and >. --></span>
<xsl:text disable-output-escaping="yes"><
<xsl:value-of select="$doctype"/>
<xsl:text disable-output-escaping="yes">>
<xsl:value-of select="$NEWLINE"/>
</xsl:template>
The one caveat about this approach is that it depends on a consistent portion of the public ID form (“-//OASIS//DTD DITA “). If there are differences in the public ID for your various DOCTYPE declarations, those differences may complicate the template.
So there you have it: DOCTYPEs in a flash. Just remember to use disable-output-escaping=”yes” and use entities where appropriate and you’ll be fine.
Docster
Thank you for this advice,i think that the method i use to add a DOCTYPE declaration on XSL output is more simple:
<![CDATA[
]]>
Docster
<xsl:text disable-output-escaping=”yes”>
</xsl:text>
Docster
<xsl:text disable-output-escaping=”yes”>
>![CDATA[
< !DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
]]>
</xsl:text>