white Scriptorium owl on a blue background

April 18, 2013

Five tips for converting content to DITA

So, you’ve decided to move to a DITA-based workflow. Before you convert your existing content to DITA, consider these five tips, which encompass both big-picture and coding-specific issues.

flickr: klwatts

Read a book about DITA best practices before you convert. Educating yourself about what coding works well (and doesn’t work so well) in the real world can save you a lot of headaches and rework. Merely reading the DITA specification is not going to give you advice, for example, on the best way to code commands in your content. The DITA Style Guide by Tony Self is a good resource, and I’m not saying that just because Scriptorium Press published it. That book has provided me with really useful information while working on DITA projects.
Don’t assume the first sentences in a section are the short description for a DITA topic. There is a strong temptation to convert the first bit of information in a section to a topic’s short description (shortdesc element). Don’t succumb to that temptation. In my experience, it is rare that the first sentences in legacy content are a true short description, which should be a standalone summary of a topic’s content. For information on best practices for short descriptions and how different outputs (HTML, PDF, help) from the DITA Open Toolkit use shortdesc elements, see Kristen Eberlein’s Art of the short description.
Use the right topic types for your content. DITA offers four topic types: generic topic, concept, task, and reference, and you should convert your content to match the purposes of those topic types. For example, don’t shoehorn a procedure better suited to a task topic into a concept topic. Yes, the DITA spec will let you code an ordered list in a concept topic that may seem like sensibly coded task. However, when it comes time for transforming your DITA content to HTML or PDF, the styling for procedures may rely on coding specific to the task element. An ordered list in a concept may not be formatted the same.
Consider how cross-references are processed by the DITA Open Toolkit. During conversion, it is a good idea to add ID attributes to items that are commonly referenced (tables and figures, for example); you need those ID attributes to create cross-references to elements. However, just because the DITA spec enables you to put an ID attribute on the title element within a fig or table element, that does not mean you should point to that title element when creating cross-references. For example, in output based on the default XHTML plugin that comes with the DITA Open Toolkit, a cross-reference to a figure will not work when the xref element points to the title element within the fig element instead of the fig element itself:
Know that valid DITA content is not the same as good DITA content. Don’t be fooled when a conversion vendor makes a big deal about how quickly it can convert your legacy information into valid DITA. The problems I mentioned in tips 2–4 can exist in valid DITA topics. The validation feature in a DITA authoring tool is not going to tell you, for example, that the two sentences you converted to a short description are not a true short description.Valid DITA ≠ semantically correct, useful DITA.

There are many other tips I could offer, but these five are a good starting point. Feel free to share your own conversion tips and war stories below.

About the author

Alan Pringle

Follow Alan on:

Content strategy. Content operations. Eating (preferably pastries and chocolate). COO at Scriptorium.

Tags:

7 comments on “Five tips for converting content to DITA”

Reply

Leigh White
April 18th, 2013 - 5:07pm

Boy, where can I start with this?! With #5, you’re a man after my own heart. A way-too-common writer refrain is, “But it’s valid. The spec says it’s okay.” In fairness, though, writers hear you talking out of both sides of your mouth…one side says to them, “You must adhere to the standard” and the other side says, “Except when the standard is too permissive.” There’s so much of a “feel” aspect of authoring, even in a standard like DITA, and I use that to counteract writers’ fears that moving to structured authoring will turn them into automatons.

If I were going to add a #6, it would be “If you’re hoping to automate your conversion, make sure your input is uniform.” Your FrameMaker or Word template can be weird…that’s okay. Those of who who write conversion scripts can work with weird. What we *can’t* work with is random.

Great post!
- Reply
  
  Alan Pringle
  April 19th, 2013 - 7:52am
  
  And I will append 6a, Leigh:
  
  If your files are so full of formatting overrides and tweaks (for example, creating side heads by inserting a disconnected text frame for each heading*), automated conversion may not be a possibility.
  
  * I didn’t make this up, unfortunately.
Reply

Sarah O'Keefe
April 19th, 2013 - 8:09am

7. If your legacy content is not already topic-based, the most efficient approach is probably to outline your modules and then rewrite the content into topics, using a few scraps here and there.
Reply

Ben
April 20th, 2013 - 11:58am

By far, the number one thing that must be considered is content strategy. Only then can you make a competent decision on which content model – whether that’s DITA or something else – will best fit what it is you want to do with your content.

I’ve seen plenty of organizations implement DITA first and only then try to figure out how a DITA workflow might enable them to do something different with their content. This is probably the worst way to implement DITA: you spend a ton of money and effort to less efficiently produce the same output that you were producing before.

Most tech pubs organizations are, for better or worse, still operating in the book-based world, producing PDFs, some flavor of tri-pane online help, and possibly ebooks. DITA, in my opinion, is poorly suited for this type of a workflow (though from a purely technological perspective, it can be made to work). If you haven’t carefully planned a content strategy for which DITA is optimal (i.e. delivering actual topics), then chances are that your DITA implementation will be unsuccessful.

Of course, even if you have designed a topic-based content strategy up front, you may find that DITA isn’t ideal for your requirements. For example, most enterprise-level DITA publishing tools seem to be geared towards book publication.
Reply

Alexander
April 23rd, 2013 - 7:31am

Thanks for the post Alan, very valid points!
Good start for those who are going to dive into this conversion…
Reply

Christina Brunk
May 9th, 2013 - 4:23pm

Oh, how I wish you would have written this list PRIOR to doing my conversion! Or wait, did you write this list BECAUSE of my conversion?
Regardless, I whole-heartedly agree…especially with #2!
- Reply
  
  Alan Pringle
  May 9th, 2013 - 6:14pm
  
  As the credits before a movie say: “Inspired by true events.” But those events weren’t all on your project! Number 4 was heavily inspired by your second conversion effort, I will admit.