Skip to main content
July 21, 2010

Tech Tips: Quick Word to DITA table conversion

The other day I had to convert a large table from Word to DITA. I started looking at Word XML output and thought about transforming it with XSL (which I have done in the past), but that seemed to be too much trouble for this document. Then I remembered a technique an old SQL coder showed me for loading large amounts of data into a SQL table.  I realized this technique could be readily adapted to DITA.

The solution hinges on two great behaviors in Word and Excel (or OpenOffice.org Text and Spreadsheet).  First, if you copy a table from Word to Excel, the table columns and rows populate columns and rows in Excel. Secondly, when you copy rows and columns from Excel to a text document (or, more precisely, an XML editor in text mode), the text in each row is taken as a single line of text.

Now comes the fun part: in Excel you can add columns before, between, and after the original table columns. In those new columns you can add DITA (or SQL) markup (such as “<row><entry>”, “</entry><entry>”, or “</entry></row>”) and quickly duplicate that markup over the length of the spreadsheet (by dragging the cell’s drag handle to the bottom of the table, or double-clicking the handle).

Thus, you can copy a table from Word into Excel, add new columns between the columns from the original table, add DITA markup in those columns, then cut and paste the table into your XML editor. Voilá, you have the body of a new DITA table.

All you have to do is add the appropriate <table>, <tgroup>, and <tbody> elements around the table contents and you’re done.

With a bit more thought, this technique can be used to add all sorts of markup to text as you convert it to DITA. How could you apply this technique?