Strange bedfellows: InDesign and DITA
or, What you need to know before you start working on a DITA to InDesign project.
There are a lot of ways to get your DITA content rendered into print/PDF. Most of them are notoriously difficult; DITA to InDesign, though, may have the distinction of the Greatest Level of Suck™.
InDesign XML formats
InDesign provides several XML formats. InDesign Markup Language (IDML) is the most robust. An IDML file is a zip container (similar to an EPUB). If you open up an IDML archive, you’ll find files that define InDesign components, such as pages, spreads, and stories. If you save a regular InDesign file to IDML, you can reopen the IDML file and get back your InDesign file, complete with layouts, graphics, formatting, customizations, and so on.
IDML is both a file format and a markup language. The IDML language is used inside the IDML file. In addition, a subset of IDML markup is used in InCopy files (ICML). Where IDML can specify the entire InDesign file, ICML just describes a single text flow.
(There is also INX, but that format is for older versions of InDesign and has now been deprecated.)
If you are planning to output from DITA to InDesign, you probably want ICML. The IDML language is used in both IDML and ICML files. The IDML specification is available as a very user-friendly PDF on Adobe’s site. I spent many not-glorious hours plowing through that document.
My best tip: If you need to understand how a particular InDesign component is set up in IDML, create a small test file and then save the file out to InCopy (ICML) format. This will give you an almost manageable snippet to review. You’ll find that InDesign includes all possible settings in the exported file. When you create your DITA-to-ICML converter, you can probably create a snippet that is 90 percent smaller (and includes much less stuff). The challenge is figuring out which 10 percent you must keep.
Understanding the role of InDesign templates
Use an InDesign template to specify page masters, paragraph styles, character styles, tables styles, and more. This template becomes your formatting specification document.
To import XML content, do the following:
- Create an ICML/IDML file that contains references to paragraphs and other styles (more on this later).
- In InDesign, open a copy of the template file.
- Place the ICML file in your template copy. The style specifications in the template are then applied to the content in the ICML and you get a formatted InDesign file.
Of course, this nifty three-step procedure elides many months of heartbreak.
The mapping challenge
A basic paragraph, in DITA, looks like this:
Paragraph text goes here.
The equivalent output in IDML is this:
Paragraph text body goes here.
Some things to notice:
- The inline formatting (CharacterStyleRange) is specified even when there is no special formatting.
- The content is enclosed in a <Content> tag.
- The <Br/> tag toward the end is required. Without it, the paragraphs are run together. In other words, if you do not specify a line break, InDesign assumes that you do not want line breaks between paragraphs.
- Extra whitespace inside the <Content> tag (such as tabs or spaces) will show up in your output. You do not want this.
- Managing space between paragraph and character tags is highly problematic.
Other important information:
- You must declare the paragraph and character tags you are using at the top of the IDML file in the RootParagraphStyleGroup and RootCharacterStyleGroup elements, respectively.
- You cannot nest character tags in InDesign. Therefore, if you have nested inline elements in DITA, you must figure out how to flatten them:
<b><i>This is a problem in InDesign</i></b>
You have to create combination styles in InDesign.
- Generally, you will have more InDesign paragraph styles than DITA elements because DITA (and XML) have hierarchical structure. For example, a paragraph p tag might be equivalent to a regular body paragraph, an indented paragraph (inside a list), a table body paragraph, and more. You have to use the element’s context as a starting point for mapping it to InDesign equivalents.
- In addition to using hierarchical tags, if you want to maintain compatibility with specializations, you must use class attributes rather than elements for your matches. That leaves to some highly awkward XSLT templates match statements.
- In addition to paragraph and character styles, you need to declare graphics, cell styes, table styles, object styles, and colors. (There may be more. That’s what I found.)
Tables are not your friend. InDesign uses a a particularly…unique table structure, in which it first declares the table grid and then just lists off all the cells. The grid coordinates start at 0:0. (Most “normal” table structures group the cells into rows explicitly.)
cell content goes here …
As you can see, this gets complicated fast.
There is so much more, but I think you get the idea. It is definitely possible to create a DITA to InDesign pipeline, but it is challenging. If you are looking at a project like this, you will need the following skills:
- Solid knowledge of InDesign
- Solid knowledge of DITA tag set
- Ability to build DITA Open Toolkit plugins, which means knowledge of Ant and XSLT at a minimum
The open source DITA4Publishers project provides a pipeline for output from DITA to InDesign. We looked at using it as a starting point in mid-2013. At the time, we found that it would be too difficult to modify DITA4Publishers to support the extensive customization layers required by our client.
Our DITA to InDesign converter is a DITA Open Toolkit plugin (built on version 1.8). It supports multiple unrelated templates for different outputs and specialized content models. It also includes support for index markers, graphics, and other items not addressed in this document. Scriptorium is available for custom plugin work on InDesign and other output types. For more information, contact us.
Heh. Months of heartbreak indeed.
I think the post should end with “Do Not Try This At Home.” Yuck.
I thought that was implied by the next-to-last sentence!
As soon as I read the headline on Twitter, this image popped into my head!
I’m still twitching now.
I don’t understand why you are trying to use the IDML or INX formats. These are methods for creating an InDesign document but not necessarily an XML-based document. Are you trying to create a document from scratch outside of InDesign? Or do you want to have an XML-based document?
If the purpose is simply to create documents from XML data or CMS systems, you could simply create an InDesign template and import the XML. For more complete docs you can throw an XSLT into the mix. Both of these methods result in a file that has a proper XML structure.
But I’m guessing you are trying to generate docs without having to open InDesign first. You may want to look at using InDesign server, it allows you to create very complex documents from XML and predefined templates automatically. In the end you can keep the XML structure or discard it as you desire.
Although InDesign will accept XML, its XML import is quite limited. DITA XML is complex and uses lots of linked objects. In order to import DITA XML into InDesign, you have to do something like this:
DITA XML >> simplified/flattened XML >> import into InDesign template
We looked at that option. For this specific project, we went with:
DITA XML >> transform into IDML (no formatting) >> place IDML into InDesign template
In both scenarios, the master pages/formatting information is managed on the InDesign side.
That sounds very interesting. Was the process fully automated? Or, did you have to build the IDML file by hand? Did it involve scripting or just XSLT?
On a completely different subject, do you know of a flattened or simplified DTD for DITA and InDesign?
We wrote a custom DITA Open Toolkit plugin (mostly XSLT) that creates the IDML file. No human intervention required on the IDML. Final layout in InDesign included humans to do additional/final production/formatting.
There have been rumblings about DITA Lite, or simplified DITA. I haven’t paid a ton of attention.
On the InDesign side, although there’s isn’t a simplified DTD that I’m aware of, it’s possible to create a fairly minimal IDML file that InDesign will accept, so you can sort of self-simplify. 🙂
I’ve only just read this and my thought was that InDesign is a mess – a consultant’s delight. (There”s probably a single word for that in German.)
A follow-up “compare and contrast” between this product and the equally knotty Scribus would interest me.
It’s true that InDesign leads to revenue. I’m not entirely sure it’s worth the hit to the consultant’s overall mental health, though.
A Scribus comparison would be interesting, but unfortunately, I don’t have any expertise with it.