A DITA implementation isn’t merely a matter of picking tools. Several factors, including wrangling the different groups affected by an implementation, are critical to successfully managing DITA projects.
Note: This post assumes you have already done a content strategy analysis, which determined the DITA standard supports your company’s business goals.
Good content addresses its audience at the right level. During a DITA implementation, you must do the same when working with the different groups affected by the DITA project. Talk to them about their particular concerns and requirements in language they understand.
For example, engineers who occasionally provide bits of content don’t care about all the fancy features of the XML authoring tool used by the tech comm group. The engineers are far more interested in contributing content as quickly and easily as possible (through a simple web interface, perhaps). Also, engineers will understand the value of DITA’s topic-based, modular approach when you compare DITA to object-oriented programming.
Even though you’ve chosen the DITA standard as your XML model, you still have to do content modeling. You can start by creating a spreadsheet cataloging the tags in your current template and then list out the DITA equivalents. Seeing how existing information works in DITA is a good way to learn the DITA structures—but be careful not to focus exclusively on how you created content in the past. You probably didn’t follow DITA best practices in your old content, so be aware of what you may need to change—or throw out altogether—to develop quality DITA content.
You also need to figure out how to track metadata—data about your data. In DITA, much of your metadata goes into element attributes. You can use those attribute values in many ways, including filtering what content does and does not go into output (conditional text), narrowing your searches of DITA source files, and including processing instructions for particular forms of output (for example, the IDs for context-sensitive help).
Metadata isn’t just at the topic or paragraph level, either; you need to think about its application at the map file level. (The map file is what collects topics together for an information product such as a PDF book, a help set, an ebook, and so on.) In map files, there are elements expressly for tracking publication-level information: publication date, version, release, and so on. Don’t fall into the trap of thinking metadata = attribute.
There are other content modeling considerations. Do you need to create a new element or attribute type through the process of specialization (creating a new structure based on an existing one)? Also, how are you going to use special DITA constructs, such as conrefs (reusing a chunk of text by referencing it) and keywords (variables for product names and other often-used words and phrases)?
Choosing a tool is not a content strategy. Choosing tools that support DITA does not mean you have a DITA strategy.
Ensure those evaluating tools aren’t suffering from “tool myopia.” They should not use their current tool’s capabilities as benchmarks for a new tool, particularly when a group is moving from a desktop publishing tool to an XML editor.
Also, tool requirements go beyond primary content creation. Think about the entire workflow; get input from all the groups affected by your DITA implementation, and remember that a tool for one group may not be a good fit for another.
Create a weighted spreadsheet so that the really important requirements get precedence during ranking. Also, you don’t have to limit your requirements to “yes or no” questions. If necessary, create short narratives that explain specific use cases and ask vendors to demonstrate how their tool supports those use cases.
Parse vendor claims carefully. Ask specific questions about a tool’s support of DITA constructs: How does your tool support the use of conrefs? If you’re not comfortable with the answers you get from a vendor during the evaluation process, you’ll continue to be uncomfortable (and unhappy) after you purchase a tool from them.
Strongly consider involving a third-party consultant (yes, Scriptorium!) in developing requirements and vetting vendors. The consultant can help you cut through the bunk and also act as the “bad cop.”
Default outputs (PDF, HTML, and help) generated from the DITA Open Toolkit are hideous—but fixable. Don’t be scared off by the ugliness of default output, and don’t let project naysayers use the default formatting as a cudgel: See, we can’t use DITA because the Open Toolkit doesn’t create decent output!
Take a software development approach to your outputs. For PDF content, you can create requirements that specify the formatting for different heading levels, admonishments, body paragraphs, steps in procedures, tables (both for table formatting itself and the text in the table), and so on. If you want to re-create what you have in a template file for your current text processing tool, you can take all the formatting aspects (font, line spacing, indent, color, and so on) from it. Detailed specs are going to make it easier to modify the stylesheets that transform your DITA XML into PDF, HTML, and other outputs.
Get significant budget for modifying the transformation stylesheets. PDF transforms are the most complex and expensive; transforms for web pages, ebooks, and so on may run less, but that depends on their complexity. Even if you have transform experience in house—and most companies don’t—you still need to block off that person’s time, and that is an expense as well.
Before you even think about conversion, research DITA best practices. Reading the DITA spec won’t help you with that. Instead, get a book such as The DITA Style Guide (download the free EPUB edition) that offers advice on the best ways to implement the many elements and features in the DITA model.
If your legacy content is not tagged consistently and has lots of formatting overrides, you cannot convert that content cleanly through scripting. Also, if your content is not easily broken into chunks that are the equivalent of DITA topics, you may be better off just starting over.
Even if you’re not going to convert legacy information, you still need content to test your DITA model and the transforms for output. Create what I call a “greatest hits collection” of DITA files that represents real-world content you create and distribute. I’d recommend 50 to 100 pages of content to be sure you’re thoroughly testing your processes and outputs.
I have yet to see a conversion project completed without some sort of complication: there is going to be bad tagging, layout quirks in your source, or some other lurking horror that will pop up. Resign yourself to dealing with these surprises, and give yourself lots of lead time so you can handle those issues.
Read more tips on converting legacy content to DITA.
Have questions about managing your DITA project? Contact us. Also, watch this recording of my webcast on DITA implementations: