Migration for the un(der)funded
Content migration from format A to format B is a challenge in the best of times. And then there are the worst of times, like the depressing situation in this message (published with permission from the author):
How do you suggest handling this situation: We are a 5-person team that is already maxed out resource-wise, and who is working in an Agile environment with overlapping sprints. We are currently working in unstructured FrameMaker, although we are very conscientious about our tag use. We are being asked by corporate to move to DITA after the first of the year. We don’t feel we have the time or the resources and we will not be afforded the luxury of having someone else convert our documents. Help!
To summarize, a small team is being asked to take on conversion of their content without any support or accommodation. This is a failure of management. (“But,” splutter the managers reading this, “what if they are exaggerating their level of business in order to avoid the DITA conversion??” If your team is actively trying to avoid DITA, that’s still the fault of management because you have not sold them on the process. So, this is either a legitimate “not enough resources” problem or it’s a change management problem. Both of those are the manager’s responsibility.
Let’s assume that our anonymous correspondent is not the only person facing this problem. Here are some suggestions (and we welcome additional input from our readers, anonymous or otherwise):
- Talk to management about the resource issue. Are they willing to negotiate on some other responsibilities to make room for the conversion project?
- 99% of our customers tell us that they are “very conscientious” about their tag use. Sometimes, that description is even accurate. Files with clean tagging are much easier to convert automatically than the other kind. Preconversion cleanup is much more efficient than postconversion cleanup.
- Know what you have. Topic-based content is much easier to convert to DITA, whereas conversion of poorly organized, repetitious content defeats the purpose altogether. If your content isn’t ready (from an information architecture standpoint), concentrate on getting it where it needs to be before worrying about conversion paths.
- Dig in to the free stuff. There are lots of great free tools out there that work quite well (see below). With a little customization, most of these can be, er, persuaded to do almost anything.
- Be reasonable. No conversion is perfect.
Tools
Google DITA conversion free and watch the results pile up. And although there are file formats that can’t be converted directly to DITA, most anything can be converted to an intermediate format that can then be converted to DITA. The most important consideration, again, is the organization of the content from the outset.
With an eye toward converting from the most two common source formats (FrameMaker and Microsoft Word), there are several good options for basic conversion that should work right out of the box (and again, bear in mind that configuration and customization can go a long, long way toward improving your results).
FrameMaker-to-DITA:
- FrameMaker conversion tables. A good conversion table will do the work for you. Map your paragraph styles to elements via the table, then run your content against it and voilà: FrameMaker will structure your unstructured content, which you can then save as XML. Conversion tables are fairly simple to set up, and, depending on your source content, can produce solid DITA output.
- FrameScript and ExtendScript. If you’re using a FrameMaker release lower than 10, FrameScript (US $149.95) is a reasonably cheap macro scripting program that can help clean up your content both before and after structuring. If you’re using FrameMaker 10-plus, ExtendScript functionality comes built-in. Note that scripts for the two are not interchangeable, and that ExtendScript, while free, has its own woes (see Simon Bate’s post on his early experiences with ExtendScript).
- HTML-to-DITA via the DITA Open Toolkit. Save your FrameMaker files as HTML, clean them up using HTML Tidy (free, GUI wrappers available, can be run against multiple source files with standard batch commands), then run them through the DITA-OT’s h2d tool (also free).
MS Word-to-DITA:
- DITA for Publishers Word-to-DITA plugin for the DITA Open Toolkit. Define a style-to-tag map and run your Word files through this plugin to produce workable DITA output. Customization can be quite tricky, but the plugin itself is a solid solution for basic conversion.
- HTML-to-DITA (again). Much the same as with FrameMaker—save as HTML, clean files using HTML Tidy, then run them through the DITA-OT’s h2d tool.
- Word-to-FrameMaker-to-DITA. In the unlikely event that you have both of these tools at your disposal, slurp the Word documents into FrameMaker, then use a conversion table to structure your content.
And remember, when you are unhappily retagging your former coworker’s “very clean” files…at least it’s not Interleaf.
Yves Barbion
Another great and “almost free” tool for conversion to DITA is MIF2Go. You typically use it for FrameMaker-to-DITA conversions, but you can also open/import Word files into FrameMaker and then convert those to DITA. MIF2Go is completely free for “students, faculty, and staff, unemployed, for underemployed consultants (you decide if you are), for most nonprofits,and for developers and documenters of FOSS” (Jeremy H. Griffith).
Ryan Fulcher
Hi Yves,
Agreed. MIF2Go is a great tool for FrameMaker file manipulation. Especially useful is the HTML splitting feature, which allows you to split out topics by paragraph tag into separate HTML files, which you can then run through the DITA-OT’s HTML-to-DITA utility. Thanks for bringing it up!
Yves Barbion
Hi Ryan
With MIF2Go, there is no need to convert to HTML first. You can convert FrameMaker files (or Word files opened in FrameMaker) to DITA topics and DITA maps straight away. One of the many things I like about MF2Go is that you can automatically add information (elements and attributes) which are not in your unstructured source files, for example shortdesc, abstract and specific attribute values.
Ryan Fulcher
Hi Yves,
[slaps forehead] Right you are, of course. I’ve been doing a good bit of customization work to the h2d utility lately (for a project not involving FrameMaker, I should add), so I must just have it on the brain.