by David Kelly
In previous publications, Scriptorium staffers have made the case that open source software (“free”) is not necessarily cheap. Our experience with customizing the DITA Open Toolkit (OT) emphasizes this point daily. One question that always crops up is, “Is the DITA OT right for me, and how much will it cost to make it right?”
Okay, that’s two questions. We are talking about the OT, so expect the unexpected. (And for “the unexpected,” read “cost.”)
One of Scriptorium’s primary services is helping customers identify their individual requirements and devising optimal solutions. As you would expect, requirements and solutions vary from customer to customer. That being said, there are also some general principles that help to identify where the OT can be used “cheaply,” where it becomes expensive, and where there may be no other choice. The following is not intended as a precise guide for your particular situation (because there are always surprises). Rather, it is intended as food for thought and as an admonition to do your homework.
The temptation to use the DITA OT is that, as an open source tool, it is free. Also, it has widespread industry support, it provides sophisticated functionality such as localization support and conditional processing, and it provides a variety of output types “out of the box.” One convincing argument in favor of the OT is that it has ongoing support from the people who developed DITA (IBM). For all of these reasons, the DITA OT is a strong option for publishing DITA-based content.
The OT can produce adequate forms of output with very little configuration, such as HTML and CHM. However, other forms of output invariably require customizations. Depending on the output type and the requirements, customizations can become very labor-intensive (read: costly). In other cases, the customizations may be minimal. However, to reiterate a theme: expect the unexpected.
What are some inexpensive modifications in the DITA OT? For starters, it is relatively easy to change the look and feel of a page in HTML, CHM, or PDF. HTML and CHM use the same stylesheets, so you get a two-for-one special on those. Font sizes, font colors, and a few things like indentation and alignment are fairly easy to modify using the attributes and attribute sets in the DITA OT. Notes and tables add a little more complexity, depending on what you are trying to do. For example, if you want to change how footnotes work for tables in PDFs, and you are using the FOP FO processor, then good luck. Footnotes are not supported in tables in FOP 0.94 (packaged with DITA OT 1.5). Another example is the note element in DITA, which supports mixed content (text and elements as siblings). Managing mixed content, depending on what you want to do, can be a pain.
Customizations for HTML and CHM are typically easier to implement than PDF in the DITA OT (with exceptions). Usually an HTML implementation does not require book-oriented functions, such as chapter numbers, prefixes and numbering on tables and figures, different formatting for different sections of the book, and so forth. Because HTML and CHM output typically have similar kinds of functions from customer to customer, such as an outlined table of contents and navigation buttons for next and previous topics, the functions in the DITA OT are often sufficient for general re-use. The look and feel can be changed with a CSS file, and off you go.
That being said, some publication functions can be problematic in HTML and CHM. Chapter numbering and figure numbering require knowledge of where a topic is located within the current document structure, and additional processing is required to do this. Another limitation worth mentioning is that the DITA OT uses Microsoft’s HTML Help Workshop for compiling the CHM files. HTML Help Workshop has not been updated for years, and it does not support Unicode. Workarounds for supporting Unicode for localized files can be quite complex.
(Edited version of photograph of American actor Edwin Booth as William Shakespeare’s Hamlet, circa 1870: Public Domain, http://en.wikipedia.org/wiki/File:Edwin_Booth_Hamlet_1870.jpg)
What sorts of things are expensive to do in the DITA OT? Answers for this question are regrettably plentiful, and a lot of them seem to involve PDF output. Or maybe it just seems this way to me because I customize PDF output on a daily basis. So here are a few items to consider.
Would you like indexing? If so, plan on using RenderX or Antenna House as your FO processor, because the free FOP processor provided with the DITA OT does not support standard XSL-FO indexing elements. Alternatively, you could develop your own indexing routines and processing strategy. Either way, plan on your budgetmeister pitching a fit. If you need to roll out DITA OT processing to multiple servers, the per-server licenses for the commercial products add up quickly. Writing an indexing routine from scratch is not cheap, but it may be your best option.
Do you need special book divisions or different headers and footers in different parts of your books? The stylesheets for setting up page sequences in the DITA OT can be mastered, eventually, but the pitfalls are more numerous than you might expect.
Do you need a different set of criteria for conditional inclusion of topics than is provided by the default DITA OT? Consider: if a topic is the target of a cross-reference, it will be included in the output even if that topic is excluded by the ditaval conditions. Does that inclusion represent a security violation for your implementation? Rewriting the conditional processing in the DITA OT can be done, but it is a serious challenge.
Do you use a lot of interesting fonts? If you are using FOP, plan on some interesting font configuration work and set aside a nice chunk of time.
How about landscape pages in the middle of your chapters? The XSL-FO used during PDF processing requires that changes of page orientation occur in separate fo:page-sequence elements. It can be done, but it requires significant restructuring of the DITA OT’s XSL-FO scripts to make it happen.
If you don’t use the DITA OT for PDF output, what are the alternatives? XMetaL uses a down-level version of the DITA OT for output – this version (22.214.171.124) can present additional work for certain types of functions. FrameMaker provides a robust non-DITA OT alternative, but the DITA application for importing DITA to FrameMaker comes with its own set of issues. Antenna House provides stylesheets for transforming DITA to PDF, but the build control that includes automated compiling of related links, conditional processing, and so forth, are not provided. There are no easy answers.
Other forms of output also may require significant modifications. The TROFF transforms in the DITA OT, for example, can be used as a basis for NROFF output, if you so choose, but significant changes are needed. And alternatives outside the DITA OT often don’t exist, so — is it still free? No, but it is a starting point.
As an anodyne to despair, let me point out that Scriptorium has, in the past, solved most of the problems used as examples in this entry. We enjoy a challenge. But we also would not want to disguise the fact that these problems can be challenges.
The point is to manage your expectations, and the expectations of your managers, with the understanding that “free is not necessarily cheap.” The DITA Open Toolkit is a complex bundle of software designed to be a starting point for solving a large, difficult, and diverse set of problems. The complexity and comprehensiveness of the DITA OT comes with a built-in price. It will pay to perform careful research for your requirements before applying the DITA OT as a universal cure-all. The question of “To OT or not to OT” may not need the deep metaphysical deliberations of a Hamlet, but it is well worth a serious analysis.