It’s not easy being…clean
published in STC Intercom, March 2011
A standards-based workflow is challenging. This article discusses the issues with DITA (an XML standard for technical communication content) and XSL-FO (Extensible Stylesheet Language Formatting Objects, a standard used to create PDF from XML (http://www.w3.org/standards/xml/publishing).
There are numerous software applications available that “support” DITA. This claim could mean that you can:
- Export to DITA.
- Import DITA content.
- Author DITA files.
- Save files in DITA.
- Validate your content against the DITA specification.
- Use DITA components, such as content references, DITA map files, and links.
- Create files for any specialization of DITA. (Specialization is a particular way of adding new elements.)
If you need a DITA authoring tool, pay attention to support for the following:
- DITA version. Does the tool support the version of DITA that you need? The latest version of DITA is 1.2, but many people are authoring in DITA 1.1 or DITA 1.0.
- Validation. Does the tool produce valid DITA content; that is, content that follows all of the rules of the DITA specification?
- Specialization support. If you are going to specialize your DITA structure, does your software support specialized content?
- Links. Does your tool provide a way to manage the links and reusable content that you’ll create?
The first major challenge in using a standard is to make sure that the software you use provides the support you need.
Using a standard in non-standard ways
Some tools use standards incorrectly, but the most common culprits are authors. When authors want to organize their content in a way that the standard does not support, they tend to resort to creative solutions. “Tag abuse” results when the markup in the file does not accurately reflect the structure of the content. An example of tag abuse is using a two-cell table without borders to create a two-column bulleted list. (The correct way to implement this in a DITA workflow would be to write a rule in the PDF output generator that creates the list.)
Standards conformance
Standards conformance is another challenge. Conformance has been a particular issue with XSL-FO. The open-source FOP software does not implement all of the XSL-FO specification (http://xmlgraphics.apache.org/fop/compliance.html). If you are writing XSL-FO code for use in FOP, you must keep in mind FOP’s limitations in addition to the constraints of the markup language itself.
On the commercial side, you have RenderX and Antenna House processors, which implement more of the XSL-FO specification than FOP. In addition, RenderX and Antenna House both provide extensions; these are proprietary codes that provide additional functionality. You may have to sacrifice your goal of conforming to the standard in order make the PDF output look the way you want.
Storing processor-specific information
XML itself provides processing instructions (PIs), which are intended for storing application-specific information. In the tech comm world, examples of PIs are conditional text settings (FrameMaker) and reviewer information (XMetaL). Processing instructions are typically not usable across applications, so using them locks you into a particular application. Technically, the PIs leave you with XML files that still conform to the standard. However, you may still lose some information if you move from one tool to another.
The slow pace of change in standards
If a standard is missing a feature you need, you have a number of unappealing choices. You can:
- “Extend” (some would say “break”) the standard to add what you need.
- Live without the feature
- Wait until the feature is added to the standard
The standards bodies move slowly. You can expect a minimum of two years between standards releases. You may not be able to wait until the feature you need is added to the standard, which again puts you in a difficult position.
Living in a standards-based world
The world of standards is full of compromises, but it is an improvement over the alternatives. Although moving XML content from one authoring tool to another is not completely seamless, it is far easier than converting among proprietary file formats. XSL-FO can be frustrating, but the ability to automate the PDF workflow is compelling.
Kai
Thanks, Sarah, for this interesting article. The last two paragraphs pretty much sum up our considerations – before we decided to use a standard in yet another non-standard way: We essentially took the “Liberace version” of DITA 1.1 (i.e., without the boring parts which we don’t need) and created our own information model with it. This way, we have a DITA-compliant content structure.
The reason we did this is to be fairly close to DITA, should we decide to use it in the distant future, while we can now go ahead with a cool XML-based HAT that does what we need. Yes, we lose the validation of our content, but that’s, uh: another compromise in the world of not-quite standards…?
G.
Quick note to say this is an excellent summary of the DITA topic, business case, challenges and I appreciate the concise business case you included as a reference. I worked with DITA for several years and was a convert at hello (maybe Epic took a while to like/love).
I learned a bit about Quark plugins to Word which restrict Word documents to constructs that can be saved as DITA XML. Really. Maybe this is of interest to you – see the Quark publishing web site for more information.
Again- thanks for an excellent posting!