XPubs: XSL-FO for Documentation Formatting

Sarah O'Keefe / ConferencesLeave a Comment

Mike Miller, Antenna House

For starters, XSL-FO is an XML standard.

XSL-FO is “a pagination markup language describing a rendering vocabulary capturing the semantics of formatting information for paginated presentation.” (Ken Holman)

Or, as I like to say, “A document layout described in a text file.”

XSL-FO is black box formatting. Can’t go back and “tweak” the files to fix them. With FO, you’re typically talking about a minimum of a couple hundred pages. Much faster to render automatically rather than by hand in InDesign or FrameMaker.

First commercial products in 2001 from Antenna House and RenderX. Also, open source FOP from Apache in 2001. FO successful in the sense that both commercial companies are doing quite well.

FO more successful than any other technical publishing application other than perhaps TeX and FrameMaker. Probably attributable to the availability of open source (free) and trial versions from commercial vendors (free).

XSL-FO is only concerned with visual display of XML data, which means that the FO file has no semantic content, only formatting instructions.

The FO stylesheet specifies:

  • page areas and sets of pages to be used to compose a document for paper (master pages)
  • Text flows, areas on pages into which the text and graphics are filled
  • Blocks within flow areas (paragraphs)
  • Inline areas (character-level formatting)

Advantages:

  • Processing and formatting are consistent and automatic.
  • Formatting rules are stored separately from the data.
  • FO is non-proprietary and human-readable (well, sort of)
  • FO less complicated than programming Java or Perl and the like
  • Can use stylesheets with different XSLT processors (DITA Open Toolkit)
  • Easier integration with other XML standards compliant applications (not trivial, but much easier than other non-standard approaches)

Antenna House has been personally involved in about 30 different DITA projects.

Most business documents can be formatted automatically as FO. Rule of thumb: “If it’s XML, FO can be applied.”

Other applications for FO might include faxes, German railway tickets, correspondence from financial institutions and government.

Typesetting is very complex with issues like widows and orphans and hyphenation. Software can handle this. Human typesetters have been removed from the process, and this shows in amateurish mistakes. But you can use FO to configure something that follows typography rules and give you a professional look and feel.

“Overwhelming benefits” of using FO. Which begs the question: “Why aren’t more people using it?” A slide with the benefits of XML showing The Usual (cost, time-to-market, less redundancy, standards-based, localization for cost justification, etc.).

People who use FO: auto manufacturers, cell phone manufacturers, banks, aerospace, government, military, educational

FO not appropriate for documents that are “artistically created.”

FO extensions provide support for:

  • Document info in PDF
  • Bookmarks for PDF
  • Column footnotes
  • Revision bars
  • MathML
  • Embedding PDF within PDF
  • Column rules
  • Punctuation spacing
  • Table autospace
  • Floats
  • Advanced hyphenation
  • Barcodes
  • several hundred extensions altogether. Antenna House uses multilingual requirements with extensions, such as special spacing requirements in Japanese or justification in Arabic through kashidas.

Thus, if you need one of these features, you might get somewhat locked into your rendering engine…the extensions are specific to a particular FO engine.

DITA Open Toolkit reduces complexity of getting set up and produce PDF. Could be configured and producing PDF in “a couple of hours.” (Perhaps, but making it look the way you want is going to take a while.) According to Mike, somewhere between a few days and a few months, depending on the complexity of your requirements.

PDF output from DITA

  • XSL-FO
  • FrameMaker
  • troff

Stages:

  • Preprocessing. Information is parsed and assembled.
  • Transformation. Formatted and generated.

Several software components are required — DITA Open Toolkit provides all the components you need.

Why not FrameMaker or InDesign?

  • Formatting is the tip of the iceberg. (WYSIWYG)
  • WYDSIWYN — What you don’t see is what you need, which includes content management, automated formatting, multilingual formatting, global access, project tracking, electronic delivery, network integration

You need WYSIWYG if:

  • You need to manually lay out pages.
  • No fixed page style
  • Need to modify page layout
  • Unstructured document formats
  • Document format is continuously changing
  • Unstructured content

If you need WYSIWYG, you need a layout engine like FrameMaker or InDesign. If you need WYDSIWYN, you need XSL-FO.

On the low end, FO is free with FOP. Antenna House is most expensive at $1250 for stand-alone or server license for $5,000.

FO supports more languages than any other solution currently available.

Solving the real problem:

  • Improve the total process, not just individual tasks
  • Improve organizational effectiveness

XSL-FO is delivering on the XML promise. Don’t underestimate it.

First question: Flowing text into typesetting engine results in line breaks that will cause readers difficulty. And this annoys him (as a professional typesetter). We want powerful, automated formatting AND the ability to do WYSIWYG tweaks. Thinks there is a role for a WYSIWYG stage after the automation bit.

I’ve noticed this on the BBC, too. British people ask really pointed questions.

And in response, Mike says that Antenna House has a solution for this where you create INX (InDesign XML) content (4 minutes) and then you can pull it into InDesign (half an hour), and do some cleanup.

Do all the XSL-FO tools cover 100% of the FO standard? “No, definitely not.”

About the Author

Sarah O'Keefe

Twitter

Content strategy consultant and founder of Scriptorium Publishing. Bilingual English-German, voracious reader, water sports, knitting, and college basketball (go Blue Devils!). Aversions to raw tomatoes, eggplant, and checked baggage.

Leave a Reply

Your email address will not be published. Required fields are marked *