Tips for converting Microsoft Word to DITA
A common requirement for many digital transformation projects is converting Word-based content into DITA XML.
Consider these factors to ensure a successful conversion effort:
- Consistent styling and organization
- Breaking Word documents into separate topics
- Who performs the conversion
Consistent styling and organization provide better results
The success of your conversion is strongly linked to two best practices for Word documents:
- Consistent styling
- Consistent organization
Without one of these, developing an automated conversion path will be difficult; without both of these factors, your conversion will require a lot of manual work.
An essential part of the conversion is mapping identifiable parts of Word documents to DITA markup. Consistent use of styles across your Word documents is a great help in developing reliable mappings.
If you don’t have consistent use of styles, content that at least looks consistent is a start. It’s likely an automated conversion can rely on that manual formatting to establish mappings. Visual consistency is useful for conversion scripts.
Word documents are a continuous stream of paragraphs. The formatting used on any given paragraph has no formal association or dependency on preceding or following paragraph styles.
DITA, on the other hand, is built on the concept of hierarchical documents. The DITA content model describes required element sequences. The DITA content model also describes which elements can be children of a given element.
Consistent use of styles and formatting in the Word files can identify the informal structure (a 30-point heading means this chunk of content is a second-level section, for example).
Breaking Word documents into separate topics
Most DITA files are organized into topics that document a single unit of content—an idea, task, or set of information. These topics are organized into hierarchy by a map or a series of maps. A single Word file, on the other hand, can span a single topic, a chapter, or an entire book.
When planning a conversion, consider how your Word files are organized and how that organization correlates to DITA topics and maps. Generally, your conversion process will break up Word documents into multiple DITA topic files.
Who performs the conversion
For a Word conversion project, resource possibilities include:
- In-house talent
- Data conversion agencies
- Content strategy consultants (Scriptorium)
You may rely on each of the groups in some capacity for your conversion effort.
If you have available resources in-house, the person or group doing the conversion is likely already familiar with your content and perhaps with how your Word files are set up. The knowledge of your content set is a big advantage. That said, it’s rare to have significant in-house resources that are available for a tedious months-long conversion effort, and the people who are experts in your content may not be experts in writing conversion scripts.
If you don’t have knowledgeable resources in-house, consider using a data conversion agency. When it comes to larger content sets of Word content with more variance in the styling and organization, we usually get a data conversion agency involved.
Data conversion agencies
Most data conversion agencies have tools to automate conversions. They can rely on past experience to help you find the best solutions for overcoming challenging aspects of your Word content.
To shape your conversion, prepare to spend a good deal of time describing how your Word content should map to DITA. A spreadsheet showing how a particular Word style or formatting maps to certain DITA structures is a good foundation for the mapping work. (Scriptorium does provide this support in many projects.)
A conversion agency will often focus on the mechanical aspects of the conversion, but it might not be able to provide much assistance with the appropriate use of DITA for your particular content and what DITA mechanisms you should implement to support reuse, conditional text, and other efficiencies. You can work with a content strategy consultancy (like us!) to establish a DITA content model that best fits your content requirements.
Scriptorium conversion support
Consultants like Scriptorium are focused on understanding content issues about the Word input and the DITA output.
This is particularly true if your DITA content requires specialized (custom) elements. We can help you develop a model with customizations supporting your specific content requirements.
If you have an extensive content set (and nearly all of our clients do!), Scriptorium will build out the content model and the mapping, and then work with an agency for the conversion work.
Are you facing a Word to DITA conversion? Not sure where to start? Contact us to discuss your options.