Tech Tips: Quick Word to DITA table conversion
The other day I had to convert a large table from Word to DITA. I started looking at Word XML output and thought about transforming it with XSL (which I have done in the past), but that seemed to be too much trouble for this document. Then I remembered a technique an old SQL coder showed me for loading large amounts of data into a SQL table. I realized this technique could be readily adapted to DITA.
The solution hinges on two great behaviors in Word and Excel (or OpenOffice.org Text and Spreadsheet). First, if you copy a table from Word to Excel, the table columns and rows populate columns and rows in Excel. Secondly, when you copy rows and columns from Excel to a text document (or, more precisely, an XML editor in text mode), the text in each row is taken as a single line of text.
Now comes the fun part: in Excel you can add columns before, between, and after the original table columns. In those new columns you can add DITA (or SQL) markup (such as “<row><entry>”, “</entry><entry>”, or “</entry></row>”) and quickly duplicate that markup over the length of the spreadsheet (by dragging the cell’s drag handle to the bottom of the table, or double-clicking the handle).
Thus, you can copy a table from Word into Excel, add new columns between the columns from the original table, add DITA markup in those columns, then cut and paste the table into your XML editor. Voilá, you have the body of a new DITA table.
All you have to do is add the appropriate <table>, <tgroup>, and <tbody> elements around the table contents and you’re done.
With a bit more thought, this technique can be used to add all sorts of markup to text as you convert it to DITA. How could you apply this technique?
Larry Kunz
So simple even an English major can do it. I love it. Thanks, Simon!
Darin
Oh the hours I’ve spent creating SQL entries via this method…
But Excel doesn’t like cells starting with a single quote, and perhaps is unfriendly to other characters, so I found it easier to work in Word, which now does many of the table handling things just like Excel.
Darin
Oh yeah… the other method: in Word, convert table to text and use a funky character (such as “$”) as the separator. Then search and replace that character with your markup. Replace carriage returns (^p) with the appropriate end/beginning markup for each row.
Karen Lowe
I have used Excel extensively to both convert legacy to DITA and createnew content. I’m not an Excel formula guru by any means, but I
have learned some techniques that have helped me create entire stub
topics and ditamaps automatically. What used to take me days to get
into DITA can now be accomplished in hours with the help of Excel and
a little utility (script) written for me by a developer.
What I’ve done is set up formulas based on columns to create
DITA-compliant XML content. I use the Concatenate and Replace formulas a lot, as well LOWER (lower case) and Substitute. As a result, I can look at a developed screen, type the field names in one column, copy another column into notepad, and then run the script. The result – the help has a bunch of DITA-compliant field topics in XML, complete with topic titles and conrefs (to say “under construction”).
I then copy different columns to create the ditamaps. Another handy reason to use Excel is that I can easily do an alphabetical sort. After I run the build, out pops a screen guide with all field level topics.
This really helps me keep up with development. And I can spend my time dredging up content, not adding XML topics. 🙂
Karen
Simon Bate
Thanks, Karen! I’ll have to keep those tricks in mind.
What I didn’t mention in my original message was that I was also able to divide the table up into a number of smaller tables by adding intervening rows that contained the requisite DITA markup.
Adam Newton
@Karen Lowe — any chance you could share more info about your Excel + script solution for creating DITA topics and ditamaps? Sounds like a real time-saver.
— Adam (a Simon pal)