Skip to main content

Tag: xsl

Tools

White paper on whitespace (and removing it)

When I first started importing DITA and other XML files into structured FrameMaker, I was surprised by the excessive whitespace that appeared in the files. Even more surprising (in FrameMaker 8.0) were the red comments displayed via the EDD that said that some whitespace was invalid (these no longer appear in FrameMaker 9).

The whitespace was visible because of an odd decision by Adobe to handle all XML whitespace as if it were significant. (XML divides the world into significant and insignificant whitespace; most XML tools treat whitespace as insignficant except where necessary…think <codeblock> elements). This approach to whitespace exists in both FrameMaker and InDesign.

At first I handled the whitespace on a case-by-case basis, removing it by hand or through regular expressions. Eventually, I realized this was a more serious problem and created an XSL transform to eliminate the white space as a part of preprocessing. By using XSL that was acceptable to Xalan (not that hard), the transform can be integrated into a FrameMaker structured application.

I figured this whitespace problem must be affecting (and frustrating) more than a few of you out there,
so I made the stylesheet available on the Scriptorium web site. I also wrote a white paper “Removing XML whitespace in structured FrameMaker documents” that describes describes the XSL that went into the stylesheet and how to integrate it with your FrameMaker structured applications.

The white paper is available on the Scriptorium web site. Information about how to download the stylesheet is in the white paper.

If the stylesheet and whitepaper are useful to you, let us know!

Read More
Tools

Ignoring DOCTYPE in XSL Transforms using Saxon 9B

Recently I had to write some XSL transforms in which I wanted to ignore the DOCTYPE declarations in the source XML files. In one case, I didn’t have access to the DTD (and the files wouldn’t have validate even if I did). In the other case, the XML files were DITA files, but I had no need or interest in validating the files; I simply needed to run a transform that modified some character data in the files.

In the first case, I ended up writing a couple of SED scripts that removed and re-inserted the DOCTYPE declaration. By the time I encountered the second case, I wanted to do something less ham-fisted, so I started investigating how to direct Saxon to ignore the DOCTYPE declaration.

My first thought was to use the -x switch in Saxon. Perhaps I didn’t use it correctly, but I couldn’t get it to work. Even though I was using a non-validating parser (Piccolo), Saxon kept telling me that the DTD couldn’t be found.

I went back to the drawing board (aka Google) and found a note from Michael Kay that said, “to ignore the DTD completely, you need to use a catalog that redirects the DTD reference to some dummy DTD.” Michael provided a link to a very useful page in the Saxon Wiki that discussed using a catalog with Saxon. After a bit of experimentation, I got it working correctly. In this blog post, I’ve distilled the information to make it useful to others who need to ignore the DOCTYPE in their XSL.

Before I describe the catalog implementation, I’d like to point out a simple solution. This solution works best when a set of XML files are in a single directory and all files use the same DOCTYPE declaration in which the system ID specifies a file:

&lt;!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"&gt;

In this case, you don’t need a catalog. It’s easier to create an empty file named “topic.dtd” (a dummy DTD) and save it in the same directory as the XML files. The XML parser looks first for the system ID; if it finds a DTD file, it uses it. Case closed.

However, there are many cases in which this simple solution doesn’t work. The system ID (“topic.dtd” in the previous example) might specify a path that cannot be reproduced on your machine…or the XML files could be spread across multiple directories…or there could be many different DOCTYPEs…or…

In these cases, it makes more sense to set up a catalog file. To specify a catalog with Saxon, you must use the XML Commons Resolver from Apache (resolver.jar). You can download the resolver from SourceForge. The good thing is, if you have the DITA Open Toolkit installed on your machine, you already have a copy of the resolver.jar file. The file is in %DITA-OT%libresolver.jar. You specify the class path for the resolver in the Java command using the -cp switch (shown below).

The resolver requires you to specify a catalog.xml file, in which you map the the public ID (or system ID) in the DOCTYPE declaration to a local DTD file. The catalog.xml file I created looks like this:

&lt;catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
&lt;public publicId="-//OASIS//DTD DITA Topic//EN" uri="dummy.dtd"/&gt;
&lt;public publicId="-//OASIS//DTD DITA Concept//EN" uri="dummy.dtd"/&gt;
&lt;public publicId="-//OASIS//DTD DITA Task//EN" uri="dummy.dtd"/&gt;
&lt;public publicId="-//OASIS//DTD DITA Reference//EN" uri="dummy.dtd"/&gt;
&lt;/catalog&gt;

Note that the uri attribute in each entry points to a dummy DTD (an empty file). The file path used for the dummy.dtd file is relative to the location of the catalog file.

Putting it all together, I created a DOS batch file to run Java and invoke Saxon:

java -cp c:saxon9saxon9.jar;C:DITA-OT1.4.3libresolver.jar ˆ
-Dxml.catalog.files=catalog.xml ˆ
net.sf.saxon.Transformˆ
-r:org.apache.xml.resolver.tools.CatalogResolver ˆ
-x:org.apache.xml.resolver.tools.ResolvingXMLReader ˆ
-y:org.apache.xml.resolver.tools.ResolvingXMLReader ˆ
-xsl:my_transform.xsl ˆ
-s:my_content.xml

The Java -cp switch adds class paths for the saxon.jar and resolver.jar files. The -D switch sets the system property xml.catalog.files to the location of the catalog.xml file.

The switches following the Java class (net.sf.saxon.Transform) are Saxon switches.

  • -r – class of the resolver
  • -x – class of the source file parser
  • -y – class of the stylesheet parser

Note, I’m using Windows (DOS) syntax here. If you are using Unix (Linux, Mac), separate the paths in the class path with a colon (:) and use the backslash () as a line continuation character.

When you run Saxon this way, you’ll notice two things: first, Saxon doesn’t complain about the DTD (yay!), but secondly, there is no DOCTYPE declaration in the output. I’ll address how to add the DOCTYPE declaration back to the output XML file in my next blog post.

Read More
DITA

An incomplete puzzle: DITA OT stylesheets

A recent post on the dita-users Yahoo group asked how to customize the DITA OT stylesheets in view of the fact that there isn’t much documentation available.

From my work customizing and otherwise perverting the DITA OT, I can sympathize with these frustrations. When I started investigating OT customizations, I found many well-crafted tutorials on how to customize and specialize the OT. These were a great starting point, but they only got me so far. In its current state, the documentation is an incomplete jigsaw puzzle; the trees and buildings are filled in nicely, but the sky is still waiting for someone with patience. (Block that metaphor!)

Because there is no documentation available at the individual template level, you need to reconsider the task at hand. I look on it as debugging, decoding, or sleuthing. With that in mind, I find the following to be very useful:

  • Find a good visual grep-like utility. I use AgentRansack, a free version of FileLocator Pro (it’s free and amazing). This enables me to locate all files that contain a particular class identifier. The visual aspect of the tool allows me to see the context quickly.
  • Use a programmer’s editor that supports XML and XSL. We use Oxygen. Not only does it help check validity and closes tags automatically, but it also provides a handy sidebar that lists the templates and their modes.
  • Liberally spread <xsl:comment> or <xsl:message> directives through the stylesheets you’re examining. That helps figure out where you are. Use <xsl:value-of> or <xsl:copy-of> to figure out what you’ve got.
  • Once you’ve figured out what happens in one of the OT templates, add comments. Now the next time you come back to it, you won’t waste time.

Probably the best form of documentation that the OT could provide here is additional comments in the stylesheets, particularly about the order of processing. I find I add many comments about where to find the template that handles nodes from an <xsl:apply-templates> directive.

One further note. On Tuesday, September 23, I’ll be presenting the third of our “Best Practices in Structured Authoring and Publishing” joint Webinar series with JustSystems. In this presentation I’ll describe a number of approaches you can use to customize DITA OT output. For more information, visit the JustSystems web site.

Read More
Tools

Writing better XSL

Jeni Tennison has a new blog. Her latest post has tips on when to use template matching, named templates, and for-each statements.

In my experience, most people who are new to XSL overuse for-each loops, because they most closely resemble familiar programming constructs.

Read More