Skip to main content


Content operations Podcast Podcast transcript Tools XML

Balancing CMS and CCMS implementation (podcast, part 1)

In episode 142 of The Content Strategy Experts Podcast, Gretyl Kinsey and Christine Cuellar discuss balancing the implementation of a content management system (CMS), and component content management system (CCMS). This is part one of a two-part podcast.

“When you have two types of content produced by your organization and different groups in charge of that, and maybe they’re in two different systems, that it’s really important to get those groups working together so that they can understand that those priorities don’t need to be competing, they just need to be balanced.”

— Gretyl Kinsey

Read More
DITA Tools

DITA to InDesign: getting your paragraph styles in order

Getting your DITA content into a high-design format like InDesign is a tricky prospect. The biggest stumbling block is the fact that there is no intrinsic link between your ICML and the template that you flow it into. In the end, your InDesign template (you’re using one, right?) is the most important part of a DITA to ICML workflow; it contains the actual styles that will control how your output appears.

Read More
DITA Tools

SubjectScheme explained

Your project is coming along nicely. You have your workflow ready, your style guides are composed, and things are looking up. However, you have complex metadata needs that are starting to cause problems. You need a way to ensure that authors are only using valid attribute values, and that your publication pipeline isn’t going to suffer. This is a situation that calls for a subjectScheme.

Read More
News Tools

LearningDITA and Oxygen XML Web Author

Since Scriptorium first announced the availability of, we have had more than 1,100 subscribers to our free online DITA courses. To complete the exercises in LearningDITA, we have recommended that students install an XML editor. This has presented a difficulty to some because they cannot or do not want to download and install an editor.

We’re happy to say this limitation now gone.

Read More
Tools Webinar

Webcast: HTML5 and its impact on technical communication

In this webcast recording, guest presenter Peter Lubbers gives a fast-paced overview of HTML5 with a focus on how it affects the tech comm field. He covers what exactly HTML5 is, why you should care, and how you can develop with HTML5. The session covers which browsers support which features, and how you can make the new features work in older browsers so you can start using HTML5 today.

Read More

White paper on whitespace (and removing it)

When I first started importing DITA and other XML files into structured FrameMaker, I was surprised by the excessive whitespace that appeared in the files. Even more surprising (in FrameMaker 8.0) were the red comments displayed via the EDD that said that some whitespace was invalid (these no longer appear in FrameMaker 9).

The whitespace was visible because of an odd decision by Adobe to handle all XML whitespace as if it were significant. (XML divides the world into significant and insignificant whitespace; most XML tools treat whitespace as insignficant except where necessary…think <codeblock> elements). This approach to whitespace exists in both FrameMaker and InDesign.

At first I handled the whitespace on a case-by-case basis, removing it by hand or through regular expressions. Eventually, I realized this was a more serious problem and created an XSL transform to eliminate the white space as a part of preprocessing. By using XSL that was acceptable to Xalan (not that hard), the transform can be integrated into a FrameMaker structured application.

I figured this whitespace problem must be affecting (and frustrating) more than a few of you out there,
so I made the stylesheet available on the Scriptorium web site. I also wrote a white paper “Removing XML whitespace in structured FrameMaker documents” that describes describes the XSL that went into the stylesheet and how to integrate it with your FrameMaker structured applications.

The white paper is available on the Scriptorium web site. Information about how to download the stylesheet is in the white paper.

If the stylesheet and whitepaper are useful to you, let us know!

Read More

Adding a DOCTYPE declaration on XSL output

In a posting a few weeks ago I discussed how to ignore the DOCTYPE declaration when processing XML through XSL. What I left unaddressed was how to add the DOCTYPE declaration back to the files. Several people have told me they’re tired of waiting for the other shoe to drop, so here’s how to add a DOCTYPE declaration.

First off: the easy solution. If the documents you are transforming always use the same DOCTYPE, you can use the doctype-public and doctype-system attributes in the <xsl:output> directive. When you specify these attributes, XSL inserts the DOCTYPE automatically.

However, if the DOCTYPE varies from file to file, you’ll have to insert the DOCTYPE declaration from your XSL stylesheet. In DITA files (and in many other XML architectures), the DOCTYPE is directly related to the root element of the document being processed. This means you can detect the name of the root element and use standard XSL to insert a new DOCTYPE declaration.

Before you charge ahead and drop a DOCTYPE declaration into your files, understand that the DOCTYPE declaration is not valid XML. If you try to emit it literally, your XSL processor will complain. Instead, you’ll have to:

  • Use entities for the less-than (“<” – “&lt;”) and greater-than (“>” – “&gt;”) signs, and
  • Disable output escaping so that the entities are actually emitted as less-than or greater-than signs (output escaping will convert them back to entities, which is precisely what you don’t want).

There are at least two possible approaches for adding DOCTYPE to your documents: use an <xsl:choose> statement to select a DOCTYPE, or construct the DOCTYPE using the XSL concat() function.

To insert the DOCTYPE declaration with an <xsl:choose> statement, use the document’s root element to select which DOCTYPE declaration to insert. Note that the entities “&gt;” and “&lt;” aren’t HTML errors in this post, they are what you need to use. Also note that the DOCTYPE statement text in this template is left-aligned so that the output DOCTYPE declarations will be left aligned. Most parsers seem to tolerate whitespace before the DOCTYPE declaration, but I prefer to err on the side of caution:

&lt;xsl:template match="/"&gt;
&lt;xsl:when test="name(node()[1]) = 'topic'"&gt;
&lt;xsl:text disable-output-escaping="yes"&gt;
&lt;!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"&gt;
&lt;xsl:when test="name(node()[1]) = 'concept'"&gt;
&lt;xsl:text disable-output-escaping="yes"&gt;
&lt;!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"&gt;
&lt;xsl:when test="name(node()[1]) = 'task'"&gt;
&lt;xsl:text disable-output-escaping="yes"&gt;
&lt;!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd"&gt;
&lt;xsl:when test="name(node()[1]) = 'reference'"&gt;
&lt;xsl:text disable-output-escaping="yes"&gt;
&lt;!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "reference.dtd"&gt;
&lt;xsl:apply-templates select="node()"/&gt;

The preceding example contains statements for the topic, concept, task, and reference topic types; if you use other topic types, you’ll need to add additional statements. Rather than write a statement for each DOCTYPE, a more general approach is to process the name of the root element and construct the DOCTYPE declaration using the XSL concat() function.

&lt;xsl:variable name="ALPHA_UC" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/&gt;
&lt;xsl:variable name="ALPHA_LC" select="'abcdefghijklmnopqrstuvwxyz'"/&gt;
&lt;xsl:variable name="NEWLINE" select="'&amp;#x0A;'"/&gt;

&lt;xsl:template match="/"&gt;
&lt;xsl:call-template name="add-doctype"&gt;
&lt;xsl:with-param name="root" select="name(node()[1])"/&gt;
&lt;xsl:apply-templates select="node()"/&gt;

<span style="color: green;">&lt;-- Create a doctype based on the root element --&gt;</span>
&lt;xsl:template name="add-doctype"&gt;
&lt;xsl:param name="root"/&gt;
<span style="color: green;">&lt;-- Create an init-cap version of the root element name. --&gt;</span>
&lt;xsl:variable name="initcap_root"&gt;
translate(substring($root,2 ),$ALPHA_UC,$ALPHA_LC))"
<span style="color: green;">&lt;-- Build the DOCTYPE by concatenating pieces.</span>
<span style="color: green;">Note that XSL syntax requires you to use the &amp;quot; entities for</span>
<span style="color: green;">quotation marks ("). --&gt;</span>

&lt;xsl:variable name="doctype"
select="concat('!DOCTYPE ',
' PUBLIC &amp;quot;-//OASIS//DTD DITA ',
'//EN&amp;quot; &amp;quot;',
'.dtd&amp;quot;') "/&gt;
&lt;xsl:value-of select="$NEWLINE"/&gt;
<span style="color: green;">&lt;-- Output the DOCTYPE surrounded by &lt; and &gt;. --&gt;</span>
&lt;xsl:text disable-output-escaping="yes"&gt;&lt;
&lt;xsl:value-of select="$doctype"/&gt;
&lt;xsl:text disable-output-escaping="yes"&gt;&gt;
&lt;xsl:value-of select="$NEWLINE"/&gt;

The one caveat about this approach is that it depends on a consistent portion of the public ID form (“-//OASIS//DTD DITA “). If there are differences in the public ID for your various DOCTYPE declarations, those differences may complicate the template.

So there you have it: DOCTYPEs in a flash. Just remember to use disable-output-escaping=”yes” and use entities where appropriate and you’ll be fine.

Read More

Ignoring DOCTYPE in XSL Transforms using Saxon 9B

Recently I had to write some XSL transforms in which I wanted to ignore the DOCTYPE declarations in the source XML files. In one case, I didn’t have access to the DTD (and the files wouldn’t have validate even if I did). In the other case, the XML files were DITA files, but I had no need or interest in validating the files; I simply needed to run a transform that modified some character data in the files.

In the first case, I ended up writing a couple of SED scripts that removed and re-inserted the DOCTYPE declaration. By the time I encountered the second case, I wanted to do something less ham-fisted, so I started investigating how to direct Saxon to ignore the DOCTYPE declaration.

My first thought was to use the -x switch in Saxon. Perhaps I didn’t use it correctly, but I couldn’t get it to work. Even though I was using a non-validating parser (Piccolo), Saxon kept telling me that the DTD couldn’t be found.

I went back to the drawing board (aka Google) and found a note from Michael Kay that said, “to ignore the DTD completely, you need to use a catalog that redirects the DTD reference to some dummy DTD.” Michael provided a link to a very useful page in the Saxon Wiki that discussed using a catalog with Saxon. After a bit of experimentation, I got it working correctly. In this blog post, I’ve distilled the information to make it useful to others who need to ignore the DOCTYPE in their XSL.

Before I describe the catalog implementation, I’d like to point out a simple solution. This solution works best when a set of XML files are in a single directory and all files use the same DOCTYPE declaration in which the system ID specifies a file:

&lt;!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"&gt;

In this case, you don’t need a catalog. It’s easier to create an empty file named “topic.dtd” (a dummy DTD) and save it in the same directory as the XML files. The XML parser looks first for the system ID; if it finds a DTD file, it uses it. Case closed.

However, there are many cases in which this simple solution doesn’t work. The system ID (“topic.dtd” in the previous example) might specify a path that cannot be reproduced on your machine…or the XML files could be spread across multiple directories…or there could be many different DOCTYPEs…or…

In these cases, it makes more sense to set up a catalog file. To specify a catalog with Saxon, you must use the XML Commons Resolver from Apache (resolver.jar). You can download the resolver from SourceForge. The good thing is, if you have the DITA Open Toolkit installed on your machine, you already have a copy of the resolver.jar file. The file is in %DITA-OT%libresolver.jar. You specify the class path for the resolver in the Java command using the -cp switch (shown below).

The resolver requires you to specify a catalog.xml file, in which you map the the public ID (or system ID) in the DOCTYPE declaration to a local DTD file. The catalog.xml file I created looks like this:

&lt;catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
&lt;public publicId="-//OASIS//DTD DITA Topic//EN" uri="dummy.dtd"/&gt;
&lt;public publicId="-//OASIS//DTD DITA Concept//EN" uri="dummy.dtd"/&gt;
&lt;public publicId="-//OASIS//DTD DITA Task//EN" uri="dummy.dtd"/&gt;
&lt;public publicId="-//OASIS//DTD DITA Reference//EN" uri="dummy.dtd"/&gt;

Note that the uri attribute in each entry points to a dummy DTD (an empty file). The file path used for the dummy.dtd file is relative to the location of the catalog file.

Putting it all together, I created a DOS batch file to run Java and invoke Saxon:

java -cp c:saxon9saxon9.jar;C:DITA-OT1.4.3libresolver.jar ˆ
-Dxml.catalog.files=catalog.xml ˆ
net.sf.saxon.Transformˆ ˆ ˆ ˆ
-xsl:my_transform.xsl ˆ

The Java -cp switch adds class paths for the saxon.jar and resolver.jar files. The -D switch sets the system property xml.catalog.files to the location of the catalog.xml file.

The switches following the Java class (net.sf.saxon.Transform) are Saxon switches.

  • -r – class of the resolver
  • -x – class of the source file parser
  • -y – class of the stylesheet parser

Note, I’m using Windows (DOS) syntax here. If you are using Unix (Linux, Mac), separate the paths in the class path with a colon (:) and use the backslash () as a line continuation character.

When you run Saxon this way, you’ll notice two things: first, Saxon doesn’t complain about the DTD (yay!), but secondly, there is no DOCTYPE declaration in the output. I’ll address how to add the DOCTYPE declaration back to the output XML file in my next blog post.

Read More

Don’t type, drag to the cmd window

I spend a good deal of time with a Windows cmd.exe window open on my desktop. If I’m not running the DITA OT, I’m testing some Perl script, or Ant, or Python, or who knows.

A few years ago (in the Windows 98 days), I discovered a nifty cmd window trick. People are consistently amazed when I demonstrate it to them. Now I’m going to share it with you.

Say you need to change directory to some long and gnarly path name. You could type the whole thing in. Or, if you have Windows Explorer open on your desktop, you can:

  1. Type “cd ” in the cmd window (the space is important).
  2. Go to Windows Explorer and find the folder you want to navigate to.
  3. Drag and drop the folder from Windows Explorer to the cmd window.

Hey presto! The path name is copied to the cmd window. What’s more, if there are spaces in the path, the path is automatically quoted.

Now you can click in the cmd window and press Enter to perform the command.

Cool! No more typing long path names for this ToolSmith.

This works for filenames too. If I’m running a Perl script that needs to work on a file way down my directory tree, I type “perl “, then drag and drop the file name from Windows Explorer into my cmd window.

I’ll keep adding more ToolSmith’s Tricks as I use them. What’s your favorite trick?

Read More