Skip to main content
January 6, 2006

To DITA or not to DITA?

In a recent discussion on the STCCIC-SIG list, Mark Baker of Analecta Communications provided an excellent analysis of DocBook, DITA, and how they are not the same thing as XML. (The discussion is reproduced here with Mark’s permission.)

The original question was this:

How specifically does one use DITA? Is it software or a set of rules used with other XML-compatible tools, such as Arbortext’s Epic?

In response, Mark wrote:

“Answering this question requires some background.

A conventional desktop publishing package is an all-in-one package that ties together all the components of a publishing tool chain into a single monolithic tool. (This is one reason why each of them is problematic in its own way. Frame had a lousy word processor, for instance, Word has a flaky formatting engine. We are so used to these systems that we tend to think of publishing as a single operation.

Publishing is actually a complex chain of processes. When you get into the XML arena, elements of that chain are often decoupled from each other. Some commercial products, like AuthorIT, package the whole authoring chain, much like Frame and Word do. Others address only part of the chain, allowing you to select or build other tools to do the rest of the work. This means they provide much more flexibility, potentially greater simplicity (since you don’t have elements you don’t need getting in your way as you do in tools like Word of Frame), but require more work, and more thought, to set up.

DocBook, for instance is basically a DTD for general technical documentation applications. It’s really just a file format. Everything else is up to you. There is the DocBook tool chain you can use if you want to. It is a collection of formatting transforms and other tools contributed by various people over time and made available as open source. You can use the DocBook DTD without using the DocBook tool chain. Alternatively, you can write using a wholly different DTD and then transform your content into DocBook (using an XSLT script, for instance) in order to use the formatting and publishing tools in the DocBook tool chain. The point is, you can assemble whatever tool chain you want from the available pieces.

If you decide to write in DocBook then you can choose any editor you like. I prefer a text editor like Jedit, with good XML support. Others prefer a structured editor such as Arbortext. Some structured editors came with WYSIWYG front ends that make working in Docbook look and feel like working in Word. (Shop carefully! You may not get that same friendly interface for other DTDs.)

Some documentation oriented CMS offer support for DocBook, meaning that the essentially create a complete integrated publishing chain based on DocBook. So, if you decide to go the DocBook route, you can build all your tools from scratch and let each author use whatever editor suits them best. Or you can assemble a system by selecting the individual pieces of the publishing chain that fit your needs best from a combination of commercial and freeware offerings, or you can go out and buy a single integrated DocBook-based authoring system. Remember that how you assemble the tool chain is probably more important to the success of your system and to the business problems that you want to solve, than the selection of the underlying file format.

Now, on to DITA. DITA is an odd hybrid. It is more than a DTD, like DocBook, but less than an generic publishing system like FrameMaker. First of all, DITA is topic based. It is about supporting authoring in topics rather than documents, and was first intended for producing help systems. However, topic based systems can also be used to produce documents, by combining topics.

DITA also recognized the important truth that a small topic-specific DTD is easier to use and to validate than a big generic document-oriented DTD like DocBook. But the problem with topic based systems that use multiple DTDs is, how do you manage the relationship between the DTDs, and how do you add a new DTD without having to rewrite your whole tool chain?

DITA’s answer is twofold. First, it uses a simple mapping mechanism to create a new DTD as a “specialization” of an existing DTD. Second, it uses a trick based on the way XSLT works to enable a transform written for the base DTD to also work for the specialized DTD, and to allow you to write a transform for the specific properties of the new DTD that inherits transforms written for the old DTD without change.

DITA thus limits you in two ways. First, the relationship between new DTDs and existing DTDs can only be expressed by a very simple mapping mechanism. This limits how specific the new DTDs can be to the task you want them to perform. Second, it means that all your processing has to be done in XSLT. You can’t insert other tools into the DITA tool chain because the inheritance of processing rules will not work for other languages. Also, existing XSLT transforms cannot be incorporated into the DITA tool chain; they have to be specifically written for DITA.

As with DocBook, there is a publicly available DITA tool chain that you can use. There is also a base DITA topic DTD and a set of standard specializations that you can use. DITA also includes a collection of standard ways for mapping and linking topics. You can take as much or as little from the DITA tool kit as you want, but if you don’t take all of it, you may lose compatibility with other DITA systems or updates to the toolkit.

DITA documents are just XML documents, so you can use any editor you like, just as with DocBook. Just as with DocBook, there are commercial packages that advertise that they support DITA. So you can build your own DITA tool chain, use the available open source DITA tool chain, mix and match elements of a tool chain from different sources, or buy a commercial package (though I am not sure how complete or integrated these are yet).

Personally, I dislike DITA. It mapping mechanism is not only crude and underpowered, it is also a hack that misuses the DTD attribute mechanism of XML. Its lock-in to XSLT is also a problem. XSLT is a great tool but there are problems for which other tools are better solutions. (Since I don’t like it, I don’t use it, so I expect DITA fans will chime in here to defend its merits and challenge my statements about its weaknesses.)

I prefer what I call a network approach. Like DITA, this approach tends to use small topic-oriented DTDs (though it is not limited to them). Where it differs is in the approach to reusing formatting code. The network approach breaks the processing down into separate steps for synthesis, presentation, formatting, and encoding. To add a new DTD for a specific topic or application, you only have to change the synthesis code to incorporate the new DTD. The presentation, formatting, and encoding code remains unchanged. Its strength is that there are no limits on how the synthesis, presentation, formatting, or encoding are done, and thus no limits on how you construct and relate your DTDs or which tools you use to do your transforms. However, there are currently no commercial or open source tool kits for this approach, so you are on your own for building the tool chain.”

I would add to this only that you cannot currently specialize DITA by
adding your own attributes. But attributes and metadata are one of the
most important advantages of XML-based publishing, and the inability to
add attributes that reflect your organization’s content requirements is
a showstopper for me and for most of our clients.