Skip to main content



Tech Tips: Quick Word to DITA table conversion

The other day I had to convert a large table from Word to DITA. I started looking at Word XML output and thought about transforming it with XSL (which I have done in the past), but that seemed to be too much trouble for this document. Then I remembered a technique an old SQL coder showed me for loading large amounts of data into a SQL table.  I realized this technique could be readily adapted to DITA.

Read More

The PDF landscape for DITA content

published in STC Intercom, May 2010

A condensed version of Creating PDF files from DITA content.

Download the PDF PDF file (130K)

There are numerous alternatives for producing PDF output from DITA content. The approach you choose will depend on your output requirements—do you need images floating in text, sidebars, and unique layouts on each page? How often do you republish content? How much content do you publish? Do you need to create variants for different audiences? Do you provide content in multiple languages?

Read More
DITA Webcast

Webcast: Demystifying DITA to PDF publishing

When you implement a DITA-based workflow, you face myriad new challenges, such as getting accustomed to topic-based writing, exploring reuse strategies, and specialization. The most difficult technical obstacle is usually setting up a PDF/print publishing workflow. The DITA Open Toolkit provides very basic PDF output, but for organizations who require attractive, professional-looking PDF content, extensive and expensive customization is required. FrameMaker is easier to configure than the Open Toolkit and produces lovely PDF files, but can you work around the limitations of the DITA support? InDesign offers the highest quality typography but has significant limitations in working with structured content. This session discusses the advantages and disadvantages of each approach to extracting PDF from DITA content.

This session is intended for individuals who are considering a DITA implementation and expect to need PDF output. Basic familiarity with DITA, XML, and related technologies is helpful but not required.
NOTE: During the recording, the presenters will mention polls. You will not see these polls while viewing the recording, but the presenters will describe the results.

Read More
DITA Webcast

2010: A DITA Odyssey

When you’re considering tools for authoring DITA content and creating output, there are many choices to evaluate. To make your journey toward DITA implementation easier, Scriptorium is offering free webinars in early 2010 to show you how three tools handle DITA-based information.

Odysseus in front of Scylla and Charybdis by Johann Heinrich Füssli (Wikipedia)

Odysseus in front of Scylla and Charybdis by Johann Heinrich Füssli (Wikipedia)

On January 19, Sarah O’Keefe will show you how MadCap Flare supports DITA constructs, and on February 16, Simon Bate will demonstrate the DITA features in the oXygen XML editor. On March 16, Scott Prentice of Leximation will demonstrate how the DITA-FMx plugin works with FrameMaker 9.

As an added bonus, attendees can win a free license of the tool shown during each demo! For more information about these sessions and to register, visit our events page.

If there are other topics you’d like to see covered in later free webcasts, please send suggestions to [email protected].

Read More

Automated trademarking in structured documents – DITA in particular

Unabashed plug warning: The following entry gives a conceptual overview of a solution Scriptorium has implemented for managing trademarks in structured tagging. And we’re proud of it.

You know the problem. According to your style standards, only the first instance of a given trademarked term should display the trademark symbol. Structured documentation allows you to re-use document parts (such as DITA topics) in just about any order you like. In Manual A, the first file containing the trademarked text is, say, Topic A; in Manual B the first file containing the trademarked text is Topic E, which is also used in Manual A. Where do you put your trademark markup, and how do you maintain it when running Manual A and Manual B at approximately the same time?

Maintaining the trademarks by hand adds a level of effort that becomes non-negligible when you start considering a large number of manuals. And the process becomes error prone – those darned human beings. Different writers might tag things different ways, trademarks might escape notice, or markup might be inserted in inappropriate places by accident.

Isn’t this one of those problems that automated documentation was supposed to solve, not create? I once had a professor who said that computers were supposed to handle the work that computers could solve so people could work on the problems that only people can solve.

More than one of Scriptorium’s customers has presented us with this problem, so we know it is not uncommon. We have found a way to deal with the problem in DITA, and we believe that the principle is sufficiently generic to use in non-DITA structures as well.

To begin with, forget conditional processing. It won’t help you with the problem of marking only the first instance of a term. In the example of Manual A, above, setting the condition “Manual A” would still display the trademark in Topic A and Topic E. This is not what your editor wants – and he or she will let you know it in spades if he or she is any kind of editor at all.

Scriptorium’s solution for DITA, in simple outline, is as follows:

  1. Using XSL, go through the ditamaps and remove all trademarking from the document files.

  2. Following a predefined list of trademarked and registered trademarked terms, go through the ditamaps and identify the files that contain each term. Create a temporary file that lists the relevant files in order of book occurrence. (This step prevents having to crawl through the ditamaps more than once.)

  3. Using Perl, iterate through the files listed for each term in the temporary file. Check the occurrence of each instance of the term, in text order, and evaluate whether it is a valid occurrence that requires trademarking. If so, wrap the appropriate trademark markup around it and go to the next trademark. If not, keep going through the text and the list of files until you find a valid occurrence of this trademark.

We possibly could have used XSL instead of Perl for the third step, but Perl’s text manipulation capability is much more robust than XSL’s, so we chose Perl.

In the implementation, the trademarking utility is coordinated by an Ant process. A user runs this utility just before the book is rendered for output. Being in Ant, the trademarking process could probably be integrated into the DITA Open Toolkit build system fairly easily to create a seamless, one-step production process.

There are a number of interesting problems that arise during implementation. For example, in step 3 the process has to evaluate whether the instance of a term is valid for trademarking. Some kinds of non-valid instances of a term in the text might be:

  • The term is in an indexterm tag.

  • The term is in an href attribute.

  • The term is in a title.

  • The term is in a codeblock tag.

You might also encounter a condition where a trademarked term could be both mixed case and all uppercase. Per your style guide, only the first instance of either should be marked, but not the first instance of both. That sort of requirement makes life just a little more interesting for a coder.

In general, the issue of trademarking first instances is not a simple problem to solve, and variations in style requirements will undoubtedly add complexity and challenges to the problem. But that’s what automated documentation is supposed to be good at, right? So we humans can get back to doing the more difficult problems that only people can solve.

I’m not sure – is that really such a good deal?

Read More

An incomplete puzzle: DITA OT stylesheets

A recent post on the dita-users Yahoo group asked how to customize the DITA OT stylesheets in view of the fact that there isn’t much documentation available.

From my work customizing and otherwise perverting the DITA OT, I can sympathize with these frustrations. When I started investigating OT customizations, I found many well-crafted tutorials on how to customize and specialize the OT. These were a great starting point, but they only got me so far. In its current state, the documentation is an incomplete jigsaw puzzle; the trees and buildings are filled in nicely, but the sky is still waiting for someone with patience. (Block that metaphor!)

Because there is no documentation available at the individual template level, you need to reconsider the task at hand. I look on it as debugging, decoding, or sleuthing. With that in mind, I find the following to be very useful:

  • Find a good visual grep-like utility. I use AgentRansack, a free version of FileLocator Pro (it’s free and amazing). This enables me to locate all files that contain a particular class identifier. The visual aspect of the tool allows me to see the context quickly.
  • Use a programmer’s editor that supports XML and XSL. We use Oxygen. Not only does it help check validity and closes tags automatically, but it also provides a handy sidebar that lists the templates and their modes.
  • Liberally spread <xsl:comment> or <xsl:message> directives through the stylesheets you’re examining. That helps figure out where you are. Use <xsl:value-of> or <xsl:copy-of> to figure out what you’ve got.
  • Once you’ve figured out what happens in one of the OT templates, add comments. Now the next time you come back to it, you won’t waste time.

Probably the best form of documentation that the OT could provide here is additional comments in the stylesheets, particularly about the order of processing. I find I add many comments about where to find the template that handles nodes from an <xsl:apply-templates> directive.

One further note. On Tuesday, September 23, I’ll be presenting the third of our “Best Practices in Structured Authoring and Publishing” joint Webinar series with JustSystems. In this presentation I’ll describe a number of approaches you can use to customize DITA OT output. For more information, visit the JustSystems web site.

Read More

The hidden costs of DITA

Originally published in STC Intercom, April 2008

DITA is a free, pre-made XML document structure. That statement can lead to a few erroneous assumptions: if it’s free, then it will cut down on costs, and if it’s pre-made, it will cut down on labor. There are several things to consider when choosing a DITA solution. Does your staff have the skills to author in a DITA environment? Will additional training be required? Does DITA even match your content model, and if it doesn’t, is it worth the effort to change?

Sarah’s conclusion? “DITA may be free, but it’s not cheap.”

Download the PDF (950 K)

Read More