Skip to main content
March 11, 2014

Making the most of your conversion to XML, part 1

Your publishing workflow has been the same for years, but new technology, different customer requirements, and company growth are making you realize you might need a change. Your print-based processes won’t always be sustainable, and XML is looking like a possibility for the future. There’s just one problem: you have thousands of pages of legacy content that you’ll need to convert, and it’s not exactly XML-friendly.


Your print workflow won’t sustain you forever.
flickr: addendentry

Here are some simple changes you can start making to your content now to help a future conversion go more smoothly:

  • Create a style guide. The more consistent your content is, the easier it will be to convert to XML. So if you’re not already using a style guide for your documentation, develop one. Decide on the ways your tables, images, lists, headings, and other kinds of text will be structured and define the conventions for how they will be used. Start applying your style guide to all your active documents, and, if possible, update your older documents to adhere to its standards.
  • Tag your content correctly. The paragraph formats you apply will be the basis for how your content gets tagged as it is converted. Therefore, it is important to use these formats as intended—for example, your level-4 heading tag may be the same font and size as your figure caption tag, but using them interchangeably just because they look alike will lead to structural problems in the converted XML.
  • Stick to the rules. Using formatting overrides—such as setting one instance of a level-1 heading to start at the top of a page but not all of them—may be allowed within your template, but that doesn’t mean it’s a good idea. These small touch-ups to the text are a quick and easy way to make your documents more visually appealing for print, but they introduce inconsistencies that will cause problems during conversion. Having a style guide won’t help unless you stick to it.
  • Avoid special formatting. Need to create a new table style for that unusual table that doesn’t fit the rules in your template? Want to control where your long titles break across lines? Wish you could align two images and add space around them to make them more visually appealing on the page? These one-off formatting adjustments may be tempting to make, but they will only cause problems during conversion. Every manual tweak that is added to the source will have to be manually removed from the converted documents, since XML is designed to separate content from formatting.

Don’t have the funds or resources to clean up your content before conversion, or can’t spare the time within your deadline-driven schedule? You can still use the conversion process to improve the quality of your content going forward.

Look for opportunities to make your content more consistent as you convert it. For example, each of your legacy documents begins with a list of parts found in that book, along with photos and descriptions. This list is structured differently from one document to the next—sometimes it’s presented in a table, other times in a bulleted or numbered list, and other times in regular paragraphs. Choose one of these options and set up the parts list the same way in every converted XML document. Set similar precedents every time you see patterns in your legacy content to ensure that the converted content is as clean and consistent as possible.

Finally, use the conversion to get your tech pubs team into the mindset of separating content from formatting, and start to break them from the print-oriented habit of adjusting pages for aesthetic purposes. Whether or not you have time to clean up your legacy documents before XML conversion, separating your content from formatting is one of the greatest benefits you will gain when you move to XML—and changing your perspective to embrace formatting-free content is one of the most important lessons you will learn.

There will be more on making the adjustment to creating content separately from formatting in part 2 of this post next month.