Skip to main content
August 10, 2012

Migration for the un(der)funded

Content migration from format A to format B is a challenge in the best of times. And then there are the worst of times, like the depressing situation in this message (published with permission from the author):

How do you suggest handling this situation: We are a 5-person team that is already maxed out resource-wise, and who is working in an Agile environment with overlapping sprints. We are currently working in unstructured FrameMaker, although we are very conscientious about our tag use. We are being asked by corporate to move to DITA after the first of the year. We don’t feel we have the time or the resources and we will not be afforded the luxury of having someone else convert our documents. Help!

To summarize, a small team is being asked to take on conversion of their content without any support or accommodation. This is a failure of management. (“But,” splutter the managers reading this, “what if they are exaggerating their level of business in order to avoid the DITA conversion??” If your team is actively trying to avoid DITA, that’s still the fault of management because you have not sold them on the process. So, this is either a legitimate “not enough resources” problem or it’s a change management problem. Both of those are the manager’s responsibility.

migrating geese spell out DITA

DITA migration, source for illustration: flickr, thirdworld

Let’s assume that our anonymous correspondent is not the only person facing this problem. Here are some suggestions (and we welcome additional input from our readers, anonymous or otherwise):

  1. Talk to management about the resource issue. Are they willing to negotiate on some other responsibilities to make room for the conversion project?
  2. 99% of our customers tell us that they are “very conscientious” about their tag use. Sometimes, that description is even accurate. Files with clean tagging are much easier to convert automatically than the other kind. Preconversion cleanup is much more efficient than postconversion cleanup.
  3. Know what you have. Topic-based content is much easier to convert to DITA, whereas conversion of poorly organized, repetitious content defeats the purpose altogether. If your content isn’t ready (from an information architecture standpoint), concentrate on getting it where it needs to be before worrying about conversion paths.
  4. Dig in to the free stuff. There are lots of great free tools out there that work quite well (see below). With a little customization, most of these can be, er, persuaded to do almost anything.
  5. Be reasonable. No conversion is perfect.

Tools

Google DITA conversion free and watch the results pile up. And although there are file formats that can’t be converted directly to DITA, most anything can be converted to an intermediate format that can then be converted to DITA. The most important consideration, again, is the organization of the content from the outset.

With an eye toward converting from the most two common source formats (FrameMaker and Microsoft Word), there are several good options for basic conversion that should work right out of the box (and again, bear in mind that configuration and customization can go a long, long way toward improving your results).

FrameMaker-to-DITA:

  • FrameMaker conversion tables. A good conversion table will do the work for you. Map your paragraph styles to elements via the table, then run your content against it and voilà: FrameMaker will structure your unstructured content, which you can then save as XML. Conversion tables are fairly simple to set up, and, depending on your source content, can produce solid DITA output.
  • FrameScript and ExtendScript. If you’re using a FrameMaker release lower than 10, FrameScript (US $149.95) is a reasonably cheap macro scripting program that can help clean up your content both before and after structuring. If you’re using FrameMaker 10-plus, ExtendScript functionality comes built-in. Note that scripts for the two are not interchangeable, and that ExtendScript, while free, has its own woes (see Simon Bate’s post on his early experiences with ExtendScript).
  • HTML-to-DITA via the DITA Open Toolkit. Save your FrameMaker files as HTML, clean them up using HTML Tidy (free, GUI wrappers available, can be run against multiple source files with standard batch commands), then run them through the DITA-OT’s h2d tool (also free).

MS Word-to-DITA:

  • DITA for Publishers Word-to-DITA plugin for the DITA Open Toolkit. Define a style-to-tag map and run your Word files through this plugin to produce workable DITA output. Customization can be quite tricky, but the plugin itself is a solid solution for basic conversion.
  • HTML-to-DITA (again). Much the same as with FrameMaker—save as HTML, clean files using HTML Tidy, then run them through the DITA-OT’s h2d tool.
  • Word-to-FrameMaker-to-DITA. In the unlikely event that you have both of these tools at your disposal, slurp the Word documents into FrameMaker, then use a conversion table to structure your content.

And remember, when you are unhappily retagging your former coworker’s “very clean” files…at least it’s not Interleaf.