The PDF landscape for DITA content

Sarah O'Keefe / DITALeave a Comment

published in STC Intercom, May 2010

A condensed version of Creating PDF files from DITA content.

Download the PDF PDF file (130K)


There are numerous alternatives for producing PDF output from DITA content. The approach you choose will depend on your output requirements—do you need images floating in text, sidebars, and unique layouts on each page? How often do you republish content? How much content do you publish? Do you need to create variants for different audiences? Do you provide content in multiple languages?

This article describes several common approaches and what requirements they support best. Your options include the following:

  • The DITA Open Toolkit (OT) with an Extensible Stylesheet Formatting Objects (XSL-FO) processor (Apache FOP, Antenna House XSL Formatter, and RenderX XEP).
  • XML authoring tools that work with the DITA OT and an XSL-FO processor to produce PDF files. They let you author XML and create PDF files from an interface (Just Systems XMetaL Author, SyncRO Soft oXygen, and Quark DITA Studio).
  • Conversion tools that produce PDF files from DITA files (WebWorks ePublisher).
  • Help authoring tools that import DITA files and have built-in PDF file conversion capabilities (MadCap Flare and Adobe RoboHelp).
  • Page-based publishing software that imports DITA files and has built-in PDF file conversion capabilities (Adobe InDesign and FrameMaker).
  • High-capacity enterprise publishing tools that work with the DITA OT and produce PDF files (SDL XySoft XML Professional Publisher and Arbortext Publishing Engine).

DITA Open Toolkit with XSL-FO processor

The DITA Open Toolkit includes support for PDF output via XSL-FO. By default, the output created through the Open Toolkit is ugly, and customizing the XSL-FO code is a daunting task. The advantages of the Open Toolkit are automation and licensing cost. You run the Open Toolkit from the command line, and it’s possible to integrate the Open Toolkit with automated build systems. If you use the free FOP processor, you can generate PDF without any software licensing costs. The commercial FO processors cost up to $5,000 but have better functionality than FOP. Configuring the Open Toolkit to produce even reasonably attractive pages requires significant technical skills and is not for the faint of heart.

DITA-capable XML authoring tools with PDF file conversion

DITA authoring tools, such as XMetaL Author and oXygen, provide a way to run the DITA Open Toolkit from within the authoring environment. This approach is friendlier than requiring an author to run a command line to kick off PDF generation, but the configuration process still requires you to modify the DITA Open Toolkit. The authoring tools do provide a way to specify some parameters (such as conditional settings). The software licenses include a rendering engine—FOP for oXygen, the XEP (a more robust commercial engine) for XMetaL.

WebWorks ePublisher 2009

WebWorks ePublisher is a conversion tool that includes the DITA Open Toolkit and XEP, so once again, you face a difficult Open Toolkit configuration project to produce the output you want. If you use ePublisher for other outputs or want some of the automation that ePublisher can provide, this might be a good option.

Another tool in this class is DITA2GO. As this article was written in March 2010, DITA2GO has announced that they are working on PDF output.

Help authoring tools with DITA import and PDF file conversion

Adobe RoboHelp and MadCap Flare both have the ability to import DITA content and render it to PDF. The process of configuring these tools is much easier than working in the DITA Open Toolkit. However, since these tools are intended primarily for help output rather than PDF, the options for print formatting tend to be somewhat limited. These tools also allow you to generate HTML-based help (often called WebHelp), so if you need that output and are happy with a simple, but respectable, PDF look, a help authoring tool might be the way to go.

If you have DITA specializations, you will probably need to transform your files back to unspecialized DITA before importing into Flare.

Page layout software with PDF output

The traditional page layout tools, including FrameMaker, InDesign, and QuarkXPress, can accept DITA content. Once the information is in the page layout application, it is treated like any other content, so you can take advantage of all the layout features. FrameMaker’s DITA support is much better than the other page layout tools and can be further improved with the third-party DITA-FMx plug-in.

The major advantage of page layout software is that you can see the exact layout and pagination and make adjustments before creating the PDF output. This workflow increases the cost of production, but may be worthwhile for highly designed publications.

Enterprise publishing tools

For enterprise publishing requirements, consider tools such as XML Professional Publisher (XPP) or Arbortext Publishing Engine. XPP is intended for high-volume, intricately formatted publications, such as in financial publishing, and allows users to make adjustments to formatting before generating the final output. The Arbortext Publishing Engine also has high-end features such as change bars and column-wide footnotes. Enterprise-class tools can address formatting requirements that none of the other options will support.

Conclusion

Evaluate the following factors to select your DITA-to-PDF file process:

  • Automation. If automated production is a priority, avoid the page layout tools and the temptation to reach into the intermediate layout files. Instead, consider the DITA Open Toolkit and choose your FO processor based on formatting requirements.
  • Formatting requirements. Are ligatures, attractive justification, and hyphenation critical? Do you have requirements, such as mixed columns on a single page, that the XSL-FO processors cannot support? You probably need page layout software. On the other hand, if your formatting requirements are simple, you can probably use any of the options discussed here; look at other evaluation factors.
  • Difficulty of configuration. If you want to minimize the difficulties in formatting your output, consider a help authoring tool. If you want a technical challenge, go with the Open Toolkit.
  • Formatting adjustments. If hand-tweaking the formatting before generating the final output is a requirement, do not use any of the options based on the Open Toolkit. You can adjust formatting in the various help authoring, page layout, and enterprise publishing tools.
  • Cost. The combination of the DITA Open Toolkit and the FOP processor is free. All of the other tools have at least some cost.
  • Existing templates. If you already have formatting templates in a specific tool, consider using that tool to produce your DITA PDF output. For example, an organization that already has an unstructured FrameMaker template or an InDesign template that meets all of their requirements might stay in those tools to take advantage of the existing template files.
  • Language support. If you need to support a wide variety of languages, verify that your languages are supported or can be supported by the tools you are considering. In particular, right-to-left languages, such as Hebrew and Arabic, are not widely supported. The DITA Open Toolkit actually has excellent language support.

Many technical communicators equate XML or DITA authoring with ugly PDF output. The default output provided through the DITA Open Toolkit is certainly rudimentary. However, there is no technical reason that PDF from the Open Toolkit should be ugly, and XSL-FO consultants are available. If automation is not a high priority, a help authoring or page layout tool could provide a reasonable alternative with a smaller learning curve.

This article is a condensed summary of Creating PDF files from DITA content.

About the Author

Sarah O'Keefe

Twitter

Content strategy consultant and founder of Scriptorium Publishing. Bilingual English-German, voracious reader, water sports, knitting, and college basketball (go Blue Devils!). Aversions to raw tomatoes, eggplant, and checked baggage.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.