Skip to main content
September 17, 2012

Perils of DITA publishing, part 3: Indexing

In which we are boxed in by the limitations of DITA indexing support.

Indexing in DITA sometimes feels like you're shoving a square peg in a round hole

flickr/Jeff Sandquist

Dilemmas abound when you’re indexing DITA content. This third installment in our Perils of DITA publishing series explains how Sarah O’Keefe and I handled some indexing challenges while developing our latest book, Content Strategy 101.

Where should I place index entries in my source DITA files?

Advice here and there on the web recommends against putting indexterm elements within the body of a topic where the referenced content occurs. Instead, guidelines suggest placing the indexterm elements in one of two locations:

  • In the prolog of a topic
  • In the topicref in a ditamap

Those recommendations make sense for maximizing topic reuse and streamlining localization. They are problematic, however, if PDF/print is a primary deliverable (as the previously mentioned industry advice also points out). Putting indexterm elements in the prolog or topicref creates entries pointing to the start of a topic in PDF output. If that topic breaks across pages, an index entry intended for information at the end of the topic points to the top of the topic on the previous page.

We opted to place our indexterm elements within the body of topics so the PDF file we sent to the printer would have precise index entries. Reuse and localization were not our priorities for this book. Happy index users were.

Because Content Strategy 101 is narrative, the topics are a bit longer than you might see in technical content; topic length played a big part in our decision. Even so, a short DITA topic with three or four paragraphs could split across pages in a PDF file. Therefore, when considering indexterm element placement, you have to balance the needs of your readers against your localization and reuse requirements. Talk with your localization vendor to determine the best placement of index entries within the body of topics if you require specifically placed index entries and still want source content that’s more streamlined for localization.

Note: Even though we decided to place indexterm elements within the body of topics, we still ran up against a few problems in regard to where indexterm elements are allowed. For example, you can’t place an indexterm element in the elements of a definition list (dt, dd) without wrapping the indexterm element in a ph element. I’m sure there are reasons the DITA specification doesn’t allow indexterm within dt and dd elements, but I don’t know what they are.

I put indexterm elements in my topics, so why does my PDF output have no index entries?

After you spend time adding index entries to DITA source files, it’s very annoying to generate PDF output through the DITA Open Toolkit and get no index entries in your PDF file. Yep, that’s right. If you use the Apache FOP processor that comes with the Open Toolkit, you will not get index entries in output based on the default PDF plugins.

It’s enough to make anyone feel downright stabby.

To get index entries in your PDF output, you have a few options:

  • Recode the index processing in the PDF plugin to work with the FOP processor. (I can hear your screams. Writing XSL-FO code isn’t my cup of tea, either. I prefer to let Simon Bate do the dirty work.)
  • Buy a plugin that includes index processing FOP can understand. (Here’s where I shamelessly plug Scriptorium’s PDF plugin, which we adapted for Content Strategy 101. Simon recently updated the plugin for the 1.6.2 release of the DITA Open Toolkit.)
  • Buy a proprietary FO formatter (such as the Antenna House Formatter) that will render the index information generated by the PDF plugin. For the record, we used the Antenna House Formatter for Content Strategy 101 and other Scriptorium Press titles authored in DITA.

None of those options is inexpensive, and each perfectly encapsulates how DITA is free but not cheap.

After using FrameMaker for years, I’m accustomed to typing colons to separate primary and secondary index entries. How difficult is it to break that habit?

Oh, it’s very hard. My first pass at the index code was full of colons because of FrameMaker muscle memory. I would type

<indexterm>hello:world</indexterm>

instead of

<indexterm>hello<indexterm>world</indexterm></indexterm>

Because so many technical authors have used FrameMaker and are primed to type colons while indexing, the DITA Open Toolkit lets you type colons to create nested index entries for PDF output generated from the pdf2 plugin. Starting with toolkit version 1.5.4, a toggle was added to control whether FrameMaker indexing syntax is supported; by default, support is turned off.

In version 1.6.2 of the Open Toolkit, the toggle is in the DITA-OT1.6.2libconfiguration-properties file with the org.dita.pdf2.index.frame-markup property. From what I can tell, FrameMaker syntax support works only with the pdf2 plugin in the Open Toolkit. Therefore, if you have outputs to generate based on the other default toolkit plugins, you’ll still have problems with the colons unless you update the transforms to handle FrameMaker syntax in indexing. (Ugh.)

Next week, you can read more about PDF output when our Perils of DITA Publishing series continues with the curious case of the PDF plugin. Stay tuned!