To OT or not to OT?

ScriptoriumTech / Opinion7 Comments

by David Kelly

In previous publications, Scriptorium staffers have made the case that open source software (“free”) is not necessarily cheap. Our experience with customizing the DITA Open Toolkit (OT) emphasizes this point daily. One question that always crops up is, “Is the DITA OT right for me, and how much will it cost to make it right?”

Okay, that’s two questions. We are talking about the OT, so expect the unexpected. (And for “the unexpected,”  read “cost.”)

One of Scriptorium’s primary services is helping customers identify their individual requirements and devising optimal solutions. As you would expect, requirements and solutions vary from customer to customer.  That being said, there are also some general principles that help to identify where the OT can be used “cheaply,” where it becomes expensive, and where there may be no other choice.  The following is not intended as a precise guide for your particular situation (because there are always surprises).  Rather, it is intended as food for thought and as an admonition to do your homework.

The temptation to use the DITA OT is that, as an open source tool, it is free. Also, it has widespread industry support, it provides sophisticated functionality such as localization support and conditional processing, and it provides a variety of output types “out of the box.”  One convincing argument in favor of the OT is that it has ongoing support from the people who developed DITA (IBM). For all of these reasons, the DITA OT is a strong option for publishing DITA-based content.

The OT can produce adequate forms of output with very little configuration, such as HTML and CHM. However, other forms of output invariably require customizations.  Depending on the output type and the requirements, customizations can become very labor-intensive (read: costly).  In other cases, the customizations may be minimal.  However, to reiterate a theme: expect the unexpected.

What are some inexpensive modifications in the DITA OT?  For starters, it is relatively easy to change the look and feel of a page in HTML, CHM, or PDF.  HTML and CHM use the same stylesheets, so you get a two-for-one special on those.  Font sizes, font colors, and a few things like indentation and alignment are fairly easy to modify using the attributes and attribute sets in the DITA OT. Notes and tables add a little more complexity, depending on what you are trying to do.  For example, if you want to change how footnotes work for tables in PDFs, and you are using the FOP FO processor, then good luck.  Footnotes are not supported in tables in FOP 0.94 (packaged with DITA OT 1.5). Another example is the note element in DITA, which supports mixed content (text and elements as siblings).  Managing mixed content, depending on what you want to do, can be a pain.

Customizations for HTML and CHM are typically easier to implement than PDF in the DITA OT (with exceptions).  Usually an HTML implementation does not require book-oriented functions, such as chapter numbers, prefixes and numbering on tables and figures, different formatting for different sections of the book, and so forth.  Because HTML and CHM output typically have similar kinds of functions from customer to customer, such as an outlined table of contents and navigation buttons for next and previous topics, the functions in the DITA OT are often sufficient for general re-use.  The look and feel can be changed with a CSS file, and off you go.

That being said, some publication functions can be problematic in HTML and CHM. Chapter numbering and figure numbering require knowledge of where a topic is located within the current document structure, and additional processing is required to do this.  Another limitation worth mentioning is that the DITA OT uses Microsoft’s HTML Help Workshop for compiling the CHM files.  HTML Help Workshop has not been updated for years, and it does not support Unicode.  Workarounds for supporting Unicode for localized files can be quite complex.

(Edited version of photograph of American actor Edwin Booth as William Shakespeare’s Hamlet, circa 1870: Public Domain, http://en.wikipedia.org/wiki/File:Edwin_Booth_Hamlet_1870.jpg)

What sorts of things are expensive to do in the DITA OT? Answers for this question are regrettably plentiful, and a lot of them seem to involve PDF output.  Or maybe it just seems this way to me because I customize PDF output on a daily basis.  So here are a few items to consider.

Would you like indexing?  If so, plan on using RenderX or Antenna House as your FO processor, because the free FOP processor provided with the DITA OT does not support standard XSL-FO indexing elements.  Alternatively, you could develop your own indexing routines and processing strategy.  Either way, plan on your budgetmeister pitching a fit.  If you need to roll out DITA OT processing to multiple servers, the per-server licenses for the commercial products add up quickly.  Writing an indexing routine from scratch is not cheap, but it may be your best option.

Do you need special book divisions or different headers and footers in different parts of your books?  The stylesheets for setting up page sequences in the DITA OT can be mastered, eventually, but the pitfalls are more numerous than you might expect.

Do you need a different set of criteria for conditional inclusion of topics than is provided by the default DITA OT?  Consider: if a topic is the target of a cross-reference, it will be included in the output even if that topic is excluded by the ditaval conditions.  Does that inclusion represent a security violation for your implementation?  Rewriting the conditional processing in the DITA OT can be done, but it is a serious challenge.

Do you use a lot of interesting fonts?  If you are using FOP, plan on some interesting font configuration work and set aside a nice chunk of time.

How about landscape pages in the middle of your chapters?  The XSL-FO used during PDF processing requires that changes of page orientation occur in separate fo:page-sequence elements.  It can be done, but it requires significant restructuring of the DITA OT’s XSL-FO scripts to make it happen.

If you don’t use the DITA OT for PDF output, what are the alternatives?  XMetaL uses a down-level version of the DITA OT for output – this version (1.4.2.1) can present additional work for certain types of functions.  FrameMaker provides a robust non-DITA OT alternative, but the DITA application for importing DITA to FrameMaker comes with its own set of issues. Antenna House provides stylesheets for transforming DITA to PDF, but the build control that includes automated compiling of related links, conditional processing, and so forth, are not provided. There are no easy answers.

Other forms of output also may require significant modifications.  The TROFF transforms in the DITA OT, for example, can be used as a basis for NROFF output, if you so choose, but significant changes are needed.  And alternatives outside the DITA OT often don’t exist, so — is it still free? No, but it is a starting point.

As an anodyne to despair, let me point out that Scriptorium has, in the past, solved most of the problems used as examples in this entry.  We enjoy a challenge.  But we also would not want to disguise the fact that these problems can be challenges.

The point is to manage your expectations, and the expectations of your managers, with the understanding that “free is not necessarily cheap.”  The DITA Open Toolkit is a complex bundle of software designed to be a starting point for solving a large, difficult, and diverse set of problems.  The complexity and comprehensiveness of the DITA OT comes with a built-in price.  It will pay to perform careful research for your requirements before applying the DITA OT as a universal cure-all.  The question of “To OT or not to OT” may not need the deep metaphysical deliberations of a Hamlet, but it is well worth a serious analysis.

About the Author

ScriptoriumTech

7 Comments on “To OT or not to OT?”

  1. Interesting post. Given the challenges of customizing DITA output, do you have any thoughts about using the stylesheets provided in the OT to transform the source to DocBook? That way, you could take advantage of the DB stylesheets, which tend to be much more robust.

    – Ben

  2. Our experience has been with going from DocBook to DITA rather than the other direction. Typically people who want to use the DocBook stylesheets are already in DocBook, while people who are using or migrating to DITA are looking for the chunked, topic-oriented authoring that DITA provides. Once in DITA, it usually makes sense to use DITA-specific stylesheets. This is especially true in DITA-oriented CMS systems that are bundled with the DITA OT — adding DocBook to the mix could make such systems quite bulky and fragile.

    The output of both sets of stylesheets may be equivalent, but the interim handling of topics during processing may be different. For instance, I don’t know how or whether DocBook generates related links during processing. I’m also not familiar enough with the DocBook stylesheets to know whether customizing them is any better or worse than customizing the DITA OT stylesheets. I do know that the DocBook stylesheets are extensive.

    I can say that the DocBook-to-DITA transforms (going the other direction) do not give complete support of the DocBook element set; I’m not sure about the case of going DocBook-to-DITA. Technically, my main concern would be how specialized elements are handled during the conversion to DocBook. The stylesheets currently in the DITA OT match on the class attribute, so inheritance should address of most of the elements, but unsupported specialized elements will be interpreted as identical to elements higher up in the inheritance chain. Possibly that is not what is wanted, in which case additional customization would be needed to handle the specialized elements.

    None of this is to say that it couldn’t work, though you would need to make a choice about whether, starting from DITA content, you want to spend your effort maintaining DocBook or DITA stylesheets. My own preference is to have as few moving parts in a system as possible.

    Although the DITA OT can sometimes be difficult to customize, I have not heard of a situation where conversion to DocBook and adoption of DocBook stylesheets provided a compelling solution to a problem in the DITA OT. However, I’d be interested in hearing of situations where this might be the case.

    Thanks for the comment – it’s an interesting thought!

  3. Wow — thanks for such an informative post. To a layperson (i.e., typical client) it can seem weird that things that appear hard are easy to do in the OT, and other things that appear easy are actually hard.

    Perhaps as a follow up you can discuss some alternatives, when the answer to “Is the DITA OT right for me” is no. What other options exist, and what are the pros and cons?

  4. The OT indeed has its issues, especially for those who are not XSLT experts. Another free alternative is DITA2Go, which uses the same fast back-end code as Mif2Go (which is not free). Nonprogrammers will find it a lot easier to use. It provides a different way of making PDFs than the OT; instead of making XSL-FO, which requires commercial rendering engines like XEP or Antenna House for real work, it makes Word files. So you can easily see what you are getting, tweak it if necessary, make a PDF, and ship… before the OT finishes one pass. Worth a look. 😉

  5. David,

    Thank you, that’s a helpful explanation. I think you also hit on another interesting aspect when you say that those who want to use the DB stylesheets are already using DB, and those who want the topic-based chunking tend to use DITA. With the advent of DocBook 5, DocBook now supports modular “chunking” and relationship mapping using assemblies (and, of course, it has supported topic-based authoring for some time through the use of ). On the flip side, DITA now supports book maps. So it would seem that both languages now provide technical communicators and publishers with the flexibility to implement a structured authoring environment that can support both linear and modular authoring models. In other words, it would appear that both DITA and DB are evolving toward each other.

  6. Scriptorium has a white paper on alternative PDF paths. We reorganized our website recently, so at the moment it’s not immediately obvious where to find it, so:

    http://www.scriptorium.com/whitepapers/dita2pdf/dita2pdf.pdf

    The paper was written before DITA2Go was released, and yes, sorry, I did miss DITA2Go in my short list of alternatives. DITA2Go certainly has a place in the world of DITA and PDFs, and Jeremy is much more qualified than I am to enumerate its benefits. Perhaps at some point we need to update our paper.

    Determining when to use a specific option is trickier to pin down. Typically it involves a good bit of analysis for specific situations. It’s something to keep in mind, though — we’re always scratching our head for new topics.

    Thanks for the responses!

  7. Ben,

    I would agree with your assessment.

    One thing I also wanted to mention (thought of it as I drove home last night): another technical limitation of the DITA2DB stylesheets in the DITA OT is that they don’t support conversion of bookmaps or ditamaps, so the books would either have to be reinstantiated in DocBook or scripts would need to be developed to perform the conversion.

    Regards!

Leave a Reply

Your email address will not be published. Required fields are marked *