Skip to main content
October 23, 2012

Perils of DITA publishing, part 8: PDF Acrobatics, revisited

A few weeks ago, I described some of the issues we faced in producing a PDF of Content Strategy 101 from DITA sources.  Time and space didn’t permit me to finish the list of changes out.  Now I can.

The issues I didn’t address were:

  • Improved widow and orphan handling in tables and lists
  • Extensive table customization
  • Options for formatting definition lists (dl element) as a table or as a list
  • Support for two-column output of simple lists
  • Automatic generation of cover and copyright pages from bookmap metadata

Widow and orphan handling in tables and lists

Widow and orphan handling is one of those areas where XSL-FO is often criticized.  However, there are a number of problem areas, particularly table rows and lists, that I was able to fix. The solution lies in XSL-FO itself and the XSL attribute sets.

Unlike CSS—where the selectors are static and know nothing about the context of the HTML tags to which they will be applied—XSL attribute sets are evaluated every time a stylesheet uses xsl:use-attribute-set. This means the attribute set can query the current context in which it will be applied. I modified the table row and list item attribute declarations so that the current element determines where it is in relationship to its siblings. If the row or list item is near the beginning or end of a list, the keep-with-previous or keep-with-next attribute value is set to “always.”

Extensive table customization

In the earlier blog post, I described how I modified the templates so that table titles could be repeated across page breaks. Actually, I implemented several other features that improved the table processing. One was the widow and orphan control mentioned above. The other main improvement was in allowing us to specify the properties for table rules in the basic-settings file.

Although we used the Antenna House Formatter to generate the final PDFs, Alan and Sarah used Apache FOP to produce their own review drafts. To handle its shortcomings, I had to modify the margin handling for table cells when Apache FOP was the FO processor.

Options for formatting definition lists (dl element) as a table or as a list

When formatting descriptive lists (dl element), the default behavior of the pdf2 transform is to display the list as a table. Unfortunately, when the term (dt) is a long, unbreakable string, this doesn’t work particularly well.  Also, when the descriptive list is more expository in nature (it has longer descriptions), the table is not an attractive implementation.  To solve this, I enabled the authors to use the outputclass attribute to specify how to present the list: either as a traditional table or as a glossary-style list with a term on one line and the definition indented, starting on a new line.

Source block dl

Output from a block dl

Support for two-column output of simple lists

The pdf2 plugin formats a simple list (sl element) as a plain list with no bullets, in a single (and potentially, quite long) column. For lists longer than five or six items, this results in unattractive output. As with the dl element handling, I enabled the authors to use the outputclass attribute to specify if the list was to be formatted as a single column or a two column list.   (For future implementations, I might allow authors to specify how many columns to use.)

Source two-column sl

When the author specifies “twocol” I divide the sli elements into two groups, then create an unruled two-column table with a single row. In each cell of the table I then used the pdf2 behavior for each of the two groups of sli elements.

Output from a two-column sl

Automatic generation of cover and copyright pages from bookmap metadata

Although it appears last in this posting, this is the one feature that just about all clients ask us for: “Can you build a cover page and put our copyright info on the inside cover?” After considering the possible bookmeta information and what would be required in most copyright pages, I implemented a transform that builds a cover page (including three levels of titles, plus a document type) along with a logo and optional cover image. The transform also populates the second page with copyright and trademark information, and optional mailing address, URL, e-mail address, and ISBN.

Because this information varies from client to client, I created a variable in the basic-settings file that specifies the order in which this information appears on the second page.  This means that I didn’t just create one set of cover page transforms for the Content Strategy 101 book; the transforms I created will enable us to quickly create covers for many clients in the future.