Putting the X back in XML
There’s more to XML for technical communication than just DITA.
Much of the industry chatter revolves around DITA content management, DITA tools, the vagaries of the DITA Open Toolkit, and DITA implementation hurdles. At Scriptorium, a steady (and large) percentage of our work is DITA-based. But there are lots of things you can do with non-DITA XML that are relevant to technical communication. This post describes a few creative solutions; perhaps they will help you think creatively about your options.
(All names and some project specifics changed.)
XML as a conversion tool
Our first customer (let’s call them the Allspice* Company) has a custom database, which the technical writers have no control over. The database contains information needed, in somewhat different form, for the user documentation. The documentation is currently in unstructured FrameMaker, and one of the chapters contains all of the database-derived information.
Here’s what we did:
- Allspice already has the ability to generate XML from the database.
- We took the generated XML and processed it with an XSLT transform to create XML that is FrameMaker-friendly. (For example, tables are set up as CALS tables in the processed content.) We also excluded some content in this step.
- In FrameMaker, we built a structured application that imports the custom XML and formats it to match the existing unstructured FrameMaker files.
Allspice can now export content from the database and bring it into FrameMaker as a fully formatted structured FrameMaker file. That structured file is then included in a book with the unstructured FrameMaker content. This process is completely automated and eliminates manual maintenance of the database-derived content in FrameMaker.
To me, this project was especially interesting because XML is just a convenient intermediate format. There’s no structured authoring involved.
XML as a publishing gateway
Another customer, Basil Corporation*, is using XML as a gateway to other formats. Once again, there is information in the organization that the technical writers must publish, but the writers do not control the source files for that information. In this case, Basil has XML files that use a custom structure. We preprocessed the files to (again) produce XML that is more usable from a content point of view. We then wrote (another) FrameMaker structured application for PDF output and a transformation that produces man pages (nroff files). Once again, there’s no actual XML-based authoring—just XML being used to extract the needed deliverables.
Custom XML for unique requirements
For a project for Cinnamon, Inc.*, we have topic-based content, which needs to be delivered in very simple HTML5. We ruled out DITA as a source format for a variety of reasons. The ones I can discuss are:
- The DITA content model is much more complex than what Cinnamon needs.
- The DITA Open Toolkit is much more complex than what Cinnamon needs.
- The various DITA CMSs are bigger than what Cinnamon wants (which eliminates one major advantage of DITA—the ready availability of CMS solutions).
Cinnamon needs to manage a lot of metadata, but the content itself is straightforward. Here, the solution will be a custom content model that provides what Cinnamon needs (and nothing else), along with an open-source tool stack. Yes, there will be a lot of custom programming, and I spent a good bit of time looking for a packaged (commercial or open-source) solution that meets Cinnamon’s requirements, but in the end, we realized that a bare-metal custom build was going to be the most efficient option given their requirements.
These three projects have something in common:
- The use of XML (and not DITA) to solve business problems related to technical communication
- Heavy reliance on custom programming
- Heavy reliance on open standards (particularly XSLT and Ant)
Is this a trend? I hope so, because these projects have been fascinating to implement.
What do you think?
* Not their real name.
The underlying implication of XML workflows (DITA or otherwise) is a reliance on batch-oriented, one-size fits all publishing. By technical necessity, these workflows are limited to very simplistic output formats (simple PDF, the infamous “tri-pane help,” basic web sites, and so on).
Is this really the model that tech com should be embracing? We now live in a world where these types of solutions aren’t very relevant; today’s user experience design demands well thought-out, highly interactive deliverables that can’t be produced using batch publishing processes.
The DITA Open Toolkit does sort of “encourage” batch publishing. The need for light-weight, incremental publishing is one reason that we’re seeing demand for non-OT solutions.
I don’t agree that those workflows are limited to simplistic output formats, though. We have built some fairly sophisticated deliverables from XML and DITA.
Thank you for your response, Sarah.
Perhaps it’s just a matter of definition. When I talk about sophisticated layouts, I’m thinking KF8, iBooks, interactive ebooks, InDesign liquid layouts, that sort of thing. The OT doesn’t get you any of that.
In fact, if your goal is to produce these sorts of deliverables, then I don’t really see why you would want to consider XML at all (and DITA in particular). HTML5 would probably be more appropriate for both authoring and publishing (plus, it wouldn’t require the super expensive consultants, tools, and specialized writers required for a DITA implementation).
First, I should probably mention that I’m one of the “super expensive consultants.” 🙂
But the real question is not “do you need format X?” The question is, “Do you need a repeatable, automated process to produce format X?” Because if you’re doing a single book, or one book per year, then you can hand-craft each deliverable. If you’re produce tens of thousands of pages per year, then you need automation.
KF8 and iBooks are just EPUB. Those are easy enough from XML and/or DITA. Interactive ebooks are a different kettle of fish, but even there, semantic tagging can be beneficial to build the interactivity automatically for certain things. (Think addresses that are mapped or something.) I don’t know much of anything about ID liquid layouts.
Oh, and using HTML5 does not preclude the need for super expensive consultants. 🙂
Out-of-the-box pdfs, ebooks, and html formats from standard tool chains may be basic, but they are typically customized. Likewise, you can write xslts to give you whatever you want. You’ve probably used sites generated using xml tool chains but you didn’t recognize them because they had done custom xslts or heavily customized the standard output.
Part of the appeal though is that you can take the same source and turn it into a variety of formats or change the look and feel programatically. A system like this may not be appropriate for things like ad hoc marekting docs however.
I’d update and extend the list in the last bullet a bit:
* xslt 2.0
* XProc (maybe run from Maven)
> Is this a trend?
From my (European) POV this is the reality in every XML project with educated clients. The DITA buzz »just« created more interested for XML; and a good share of people still did not understand the real benefits of XML – basically thinking of DITA and/or XML as a file format. But as soon as they got the original idea of XML, they also started to have their individual requirements which lead to customized solutions. Some of them started with DITA, some of them with other pseudo-standard DTDs, some built their own DTD from scratch.
The availability of affordable tools (compared to SGML) really helped. Of course, the client must be willing to go this way, and companies which insist on buying shrink-warp software only will never be our customers 🙂