Beyond production: DITA transformations for QA
I’ve written in the past on how a QA mindset can improve the quality and consistency of your content. While having a robust test set and test plan are useful, there’s another tool that you can use.
I’ve written in the past on how a QA mindset can improve the quality and consistency of your content. While having a robust test set and test plan are useful, there’s another tool that you can use.
This post is part of Scriptorium’s 20th anniversary celebration.
Content creators love their tools. So much, in fact, they sometimes mistake selecting tools for developing a content strategy.
Getting your DITA content into a high-design format like InDesign is a tricky prospect. The biggest stumbling block is the fact that there is no intrinsic link between your ICML and the template that you flow it into. In the end, your InDesign template (you’re using one, right?) is the most important part of a DITA to ICML workflow; it contains the actual styles that will control how your output appears.
We’ve written before on what lurks beneath the surface of an InDesign file, and how drastically it differs from the DITA standard. When you’re looking at going from DITA to InDesign, though, there’s a lot that you need to take into consideration before you jump in.
Your project is coming along nicely. You have your workflow ready, your style guides are composed, and things are looking up. However, you have complex metadata needs that are starting to cause problems. You need a way to ensure that authors are only using valid attribute values, and that your publication pipeline isn’t going to suffer. This is a situation that calls for a subjectScheme.
Since Scriptorium first announced the availability of LearningDITA.com, we have had more than 1,100 subscribers to our free online DITA courses. To complete the exercises in LearningDITA, we have recommended that students install an XML editor. This has presented a difficulty to some because they cannot or do not want to download and install an editor.
We’re happy to say this limitation now gone.
Automated PDF formatting works well for technical communication. But what about highly designed content for printed books? How can companies enable flexibility in print/PDF layouts generated from structured content?
In this webcast recording, Alan Pringle discusses the challenges of ebook distribution and how Scriptorium has addressed them when selling EPUB and Kindle editions. Topics covered include:
In this webcast recording, George Bina shows you how to create DITA content from zero to a full deliverable using oXygen. The full deliverable leads to multiple publishing formats.
In this video recording, guest presenter Sarah Maddox explains why collaboration is a good thing, why a wiki is a good solution for it, and how to do it on Confluence.
Accessibility is a term commonly associated with the process of making content available for people with vision, hearing, and mobility impairments.Â
In this webcast recording, guest presenter Peter Lubbers gives a fast-paced overview of HTML5 with a focus on how it affects the tech comm field. He covers what exactly HTML5 is, why you should care, and how you can develop with HTML5. The session covers which browsers support which features, and how you can make the new features work in older browsers so you can start using HTML5 today.
Modifying FrameMaker cross-reference formats: it’s basic and one of the cool things about FrameMaker. But not if you’re editing DITA files using FrameMaker 9 or 10.
In this webcast, Simon Bate leads viewers through the key steps in using XSL (extensible stylesheet language) to perform XML-to-XML conversions, a process that differs from more traditional XML-to-PDF and XML-to-HTML conversions.
The ePub spec is long and very formal, but the format itself is fairly straightforward. And while building an ePub by hand is not complicated in itself, reworking content from other formats can be tricky.
Many content management systems (CMSs) take over the responsibility of file naming. For the most part, this is fine and is actually necessary for maintaining cross-references and conrefs within the CMS. When you use the CMS to build a DITA map, the CMS uses its own names in the <topicref> elements.
This webcast demonstrates using the DITA-FMx plugin with FrameMaker 9 to author, edit, and create output from DITA content. Topics covered during the demo include creating DITA topics using different options and templates and generating a book from the map and then saving to a PDF file.
When I first started importing DITA and other XML files into structured FrameMaker, I was surprised by the excessive whitespace that appeared in the files. Even more surprising (in FrameMaker 8.0) were the red comments displayed via the EDD that said that some whitespace was invalid (these no longer appear in FrameMaker 9).
The whitespace was visible because of an odd decision by Adobe to handle all XML whitespace as if it were significant. (XML divides the world into significant and insignificant whitespace; most XML tools treat whitespace as insignficant except where necessary…think <codeblock> elements). This approach to whitespace exists in both FrameMaker and InDesign.
At first I handled the whitespace on a case-by-case basis, removing it by hand or through regular expressions. Eventually, I realized this was a more serious problem and created an XSL transform to eliminate the white space as a part of preprocessing. By using XSL that was acceptable to Xalan (not that hard), the transform can be integrated into a FrameMaker structured application.
I figured this whitespace problem must be affecting (and frustrating) more than a few of you out there,
so I made the stylesheet available on the Scriptorium web site. I also wrote a white paper “Removing XML whitespace in structured FrameMaker documents” that describes describes the XSL that went into the stylesheet and how to integrate it with your FrameMaker structured applications.
The white paper is available on the Scriptorium web site. Information about how to download the stylesheet is in the white paper.
If the stylesheet and whitepaper are useful to you, let us know!
In a posting a few weeks ago I discussed how to ignore the DOCTYPE declaration when processing XML through XSL. What I left unaddressed was how to add the DOCTYPE declaration back to the files. Several people have told me they’re tired of waiting for the other shoe to drop, so here’s how to add a DOCTYPE declaration.
First off: the easy solution. If the documents you are transforming always use the same DOCTYPE, you can use the doctype-public and doctype-system attributes in the <xsl:output> directive. When you specify these attributes, XSL inserts the DOCTYPE automatically.
However, if the DOCTYPE varies from file to file, you’ll have to insert the DOCTYPE declaration from your XSL stylesheet. In DITA files (and in many other XML architectures), the DOCTYPE is directly related to the root element of the document being processed. This means you can detect the name of the root element and use standard XSL to insert a new DOCTYPE declaration.
Before you charge ahead and drop a DOCTYPE declaration into your files, understand that the DOCTYPE declaration is not valid XML. If you try to emit it literally, your XSL processor will complain. Instead, you’ll have to:
There are at least two possible approaches for adding DOCTYPE to your documents: use an <xsl:choose> statement to select a DOCTYPE, or construct the DOCTYPE using the XSL concat() function.
To insert the DOCTYPE declaration with an <xsl:choose> statement, use the document’s root element to select which DOCTYPE declaration to insert. Note that the entities “>” and “<” aren’t HTML errors in this post, they are what you need to use. Also note that the DOCTYPE statement text in this template is left-aligned so that the output DOCTYPE declarations will be left aligned. Most parsers seem to tolerate whitespace before the DOCTYPE declaration, but I prefer to err on the side of caution:
<xsl:template match="/">
<xsl:choose>
<xsl:when test="name(node()[1]) = 'topic'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'concept'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'task'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'reference'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "reference.dtd">
</xsl:text>
</xsl:when>
</xsl:choose>
<xsl:apply-templates select="node()"/>
</xsl:template>
The preceding example contains statements for the topic, concept, task, and reference topic types; if you use other topic types, you’ll need to add additional statements. Rather than write a statement for each DOCTYPE, a more general approach is to process the name of the root element and construct the DOCTYPE declaration using the XSL concat() function.
<xsl:variable name="ALPHA_UC" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="ALPHA_LC" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="NEWLINE" select="'&#x0A;'"/>
<xsl:template match="/">
<xsl:call-template name="add-doctype">
<xsl:with-param name="root" select="name(node()[1])"/>
</xsl:call-template>
<xsl:apply-templates select="node()"/>
</xsl:template>
<span style="color: green;"><-- Create a doctype based on the root element --></span>
<xsl:template name="add-doctype">
<xsl:param name="root"/>
<span style="color: green;"><-- Create an init-cap version of the root element name. --></span>
<xsl:variable name="initcap_root">
<xsl:value-of
select="concat(translate(substring($root,1,1),$ALPHA_LC,$ALPHA_UC),
translate(substring($root,2 ),$ALPHA_UC,$ALPHA_LC))"
/>
</xsl:variable>
<span style="color: green;"><-- Build the DOCTYPE by concatenating pieces.</span>
<span style="color: green;">Note that XSL syntax requires you to use the &quot; entities for</span>
<span style="color: green;">quotation marks ("). --></span>
<xsl:variable name="doctype"
select="concat('!DOCTYPE ',
$root,
' PUBLIC &quot;-//OASIS//DTD DITA ',
$initcap_root,
'//EN&quot; &quot;',
$root,
'.dtd&quot;') "/>
<xsl:value-of select="$NEWLINE"/>
<span style="color: green;"><-- Output the DOCTYPE surrounded by < and >. --></span>
<xsl:text disable-output-escaping="yes"><
<xsl:value-of select="$doctype"/>
<xsl:text disable-output-escaping="yes">>
<xsl:value-of select="$NEWLINE"/>
</xsl:template>
The one caveat about this approach is that it depends on a consistent portion of the public ID form (“-//OASIS//DTD DITA “). If there are differences in the public ID for your various DOCTYPE declarations, those differences may complicate the template.
So there you have it: DOCTYPEs in a flash. Just remember to use disable-output-escaping=”yes” and use entities where appropriate and you’ll be fine.
Recently I had to write some XSL transforms in which I wanted to ignore the DOCTYPE declarations in the source XML files. In one case, I didn’t have access to the DTD (and the files wouldn’t have validate even if I did). In the other case, the XML files were DITA files, but I had no need or interest in validating the files; I simply needed to run a transform that modified some character data in the files.
In the first case, I ended up writing a couple of SED scripts that removed and re-inserted the DOCTYPE declaration. By the time I encountered the second case, I wanted to do something less ham-fisted, so I started investigating how to direct Saxon to ignore the DOCTYPE declaration.
My first thought was to use the -x switch in Saxon. Perhaps I didn’t use it correctly, but I couldn’t get it to work. Even though I was using a non-validating parser (Piccolo), Saxon kept telling me that the DTD couldn’t be found.
I went back to the drawing board (aka Google) and found a note from Michael Kay that said, “to ignore the DTD completely, you need to use a catalog that redirects the DTD reference to some dummy DTD.” Michael provided a link to a very useful page in the Saxon Wiki that discussed using a catalog with Saxon. After a bit of experimentation, I got it working correctly. In this blog post, I’ve distilled the information to make it useful to others who need to ignore the DOCTYPE in their XSL.
Before I describe the catalog implementation, I’d like to point out a simple solution. This solution works best when a set of XML files are in a single directory and all files use the same DOCTYPE declaration in which the system ID specifies a file:
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
However, there are many cases in which this simple solution doesn’t work. The system ID (“topic.dtd” in the previous example) might specify a path that cannot be reproduced on your machine…or the XML files could be spread across multiple directories…or there could be many different DOCTYPEs…or…
In these cases, it makes more sense to set up a catalog file. To specify a catalog with Saxon, you must use the XML Commons Resolver from Apache (resolver.jar). You can download the resolver from SourceForge. The good thing is, if you have the DITA Open Toolkit installed on your machine, you already have a copy of the resolver.jar file. The file is in %DITA-OT%libresolver.jar. You specify the class path for the resolver in the Java command using the -cp switch (shown below).
The resolver requires you to specify a catalog.xml file, in which you map the the public ID (or system ID) in the DOCTYPE declaration to a local DTD file. The catalog.xml file I created looks like this:
<catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="-//OASIS//DTD DITA Topic//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Concept//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Task//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Reference//EN" uri="dummy.dtd"/>
</catalog>
Putting it all together, I created a DOS batch file to run Java and invoke Saxon:
java -cp c:saxon9saxon9.jar;C:DITA-OT1.4.3libresolver.jar ˆ
-Dxml.catalog.files=catalog.xml ˆ
net.sf.saxon.Transformˆ
-r:org.apache.xml.resolver.tools.CatalogResolver ˆ
-x:org.apache.xml.resolver.tools.ResolvingXMLReader ˆ
-y:org.apache.xml.resolver.tools.ResolvingXMLReader ˆ
-xsl:my_transform.xsl ˆ
-s:my_content.xml
The switches following the Java class (net.sf.saxon.Transform) are Saxon switches.
Note, I’m using Windows (DOS) syntax here. If you are using Unix (Linux, Mac), separate the paths in the class path with a colon (:) and use the backslash () as a line continuation character.
When you run Saxon this way, you’ll notice two things: first, Saxon doesn’t complain about the DTD (yay!), but secondly, there is no DOCTYPE declaration in the output. I’ll address how to add the DOCTYPE declaration back to the output XML file in my next blog post.
I spend a good deal of time with a Windows cmd.exe window open on my desktop. If I’m not running the DITA OT, I’m testing some Perl script, or Ant, or Python, or who knows.
A few years ago (in the Windows 98 days), I discovered a nifty cmd window trick. People are consistently amazed when I demonstrate it to them. Now I’m going to share it with you.
Say you need to change directory to some long and gnarly path name. You could type the whole thing in. Or, if you have Windows Explorer open on your desktop, you can:
Hey presto! The path name is copied to the cmd window. What’s more, if there are spaces in the path, the path is automatically quoted.
Now you can click in the cmd window and press Enter to perform the command.
Cool! No more typing long path names for this ToolSmith.
This works for filenames too. If I’m running a Perl script that needs to work on a file way down my directory tree, I type “perl myScriptName.pl “, then drag and drop the file name from Windows Explorer into my cmd window.
I’ll keep adding more ToolSmith’s Tricks as I use them. What’s your favorite trick?
Which graphics formats should you use in your documentation? For print, the traditional advice is EPS for line drawings and TIFF for screen captures and photographs. That’s still good advice. These days, you might choose PDF and PNG for the same purposes. There are caveats for each of these formats, but in general, these are excellent choices.
Of course, everybody knows to stay away from WMF, the Windows Metafile Format. WMF doesn’t handle gradients, can’t have more than 256 colors, and refuses to play nice with anything other than Windows.
Think you’re too good to hang out with WMF? For your print and online documentation, perhaps. But it may be a great choice to give to your company’s PowerPoint users.
Are you familiar with this scenario? PowerPoint User saw some graphics in your documentation and thought they would work for some sales presentations. The screen captures are easy; you just give PowerPoint User PNGs or BMPs or whatever. It’s the line drawings that are the problem. PowerPoint User doesn’t have Illustrator and has never heard of EPS. PowerPoint User says, “Can you give me a copy of those pictures in a format that I can use in PowerPoint? Oh, and can make that box purple and change that font for me first? And move that line just a little bit? And make that line thicker? And remove that entire right side of the picture and split it into two pictures?”
You want PowerPoint User to reuse the graphics; you’re all about reuse. But you have dealt with PowerPoint User before, and you know you will never get your real job done if you get pulled into the sucking vortex of PowerPoint User’s endless requests.
The secret is to give PowerPoint User the graphics in a format that can be edited from within PowerPoint (or Word): WMF. Here’s the drill that will make you a hero:
WMF. It will make PowerPoint User go away…happy!
Jeni Tennison has a new blog. Her latest post has tips on when to use template matching, named templates, and for-each statements.
In my experience, most people who are new to XSL overuse for-each loops, because they most closely resemble familiar programming constructs.