Skip to main content
May 16, 2022

Mapping from custom XML to DITA

If you were an early adopter of structured content, there’s a good chance that you have a custom XML content model. This article describes the process Scriptorium uses to make a shift from custom XML into DITA.

Planning the transition

You can think of any content model as having a physical shape. When you move from one content model to another, your old shape may or may not fit into the new shape. The first step, then, is to assess the existing content model to understand its elements, attributes, relationships, and features. You may have documentation of the legacy content model that you can lean on. Unfortunately, it’s common to have outdated or incomplete documentation.

Mapping tags

After completing the assessment of the current content model, you need to map it to baseline DITA. Completing this task will give you a good sense of how well the models align. For example, DITA has a <p> tag for regular block paragraphs, and you’ll see something similar, like a <para>, in many other content models. A DITA <title> might be found in a <heading1> or a series of <h1>, <h2>, <h3> tags.

More often, you’ll run into some challenges with tags and metadata created for your specific content. For example, DITA has <note> for notes, cautions, and warnings, but you may have a specific <topple> tag for warnings about items that can tip over. If there is a gap in the DITA baseline mapping, you have several options to address the problem.

Handling metadata

Most organizations have custom metadata, and that metadata likely doesn’t quite match the DITA metadata framework. You’ll want to compile a list of existing metadata and then figure out how to map it to DITA and where changes or extensions are required.

Looking at links, hierarchy, and sequencing

DITA’s map files, which provide content hierarchy and sequencing of topics (like a table of contents), may or may not have a direct equivalent in the legacy files. You’ll need to figure out how to build the map file from the logic inherent in the legacy files.

Links can also be challenging. A link to an external website is relatively easy to build in DITA. But links in and among your files are likely more challenging, especially if you are starting from chapter-level files and converting them to a group of DITA topics.

Reuse, variables, and conditionals

DITA offers numerous reuse features at the topic, block, and paragraph level. As you begin planning the transition, consider how reuse could improve your content operations by eliminating redundant content and copy/paste work. 

Content variants let you further refine your content reuse. For example, you might have two sets of instructions that are identical except for a product name. You can capture the product name as a variable, so that you can generate two sets of instructions that use different product names from a single source file.

Similarly, you can use conditionals to flag a chunk of text that belongs only in a specific variant, which allows you to generate the content with or without that chunk of text.

You may have equivalent functionality in your legacy XML already, in which case you’ll map to DITA equivalents. It’s more likely, though, that you’ll add reuse, conditionals, and variants on the DITA side.

Keep in mind that localization requirements affect how you set up reuse, variables, and conditionals. Avoid overly complex reuse scenarios, especially inside sentences. 

Extend DITA model through specialization

The planning process will give you a roadmap for what you need in your DITA content model. At this point, you can begin building out the needed customizations using specialization. Specialization is a mechanism in DITA that lets you add new tags and attributes without losing conformance with the DITA standard. 

Implement in your authoring tools

Once you have the specialization files, you can build out your authoring and publishing environments. This may involve setting up authoring tools, a DITA component content management system, publishing pipelines, and more.

The difficulty in moving out of custom XML and into DITA depends on the complexity of the legacy model and how different it is from the DITA mindsets. If you have topic-based files with a fairly straightforward tag set (paragraph, notes, titles, and lists), you can expect a relatively smooth transition. If you have extensive custom elements and metadata, expect a bigger effort.

And of course, if you decide you need some support with this process, please contact us.