Skip to main content

Tag: xml


XML for Lone Writers?

My December article for STC Intercom, XML & Lone Writers: Can They Go Together? is now available. From the conclusion:

The relatively low percentage of lone writers who have implemented XML is a logical result of the typical lone writer working environment. Although it is possible for lone writers to implement XML, a very cautious evaluation of the idea is definitely in order. Given the current status of the authoring and publishing tools, any lone writer who implements XML will need to master fairly demanding tools and technologies.

The stars of this article are the members of the Lone Writer SIG mailing list, who generously responded to a request for information.

XML & Lone Writers: Can They Go Together? (PDF, 200K)

Read More

XML 101

My latest XML Strategist article, “The ABCs of XML,” is available as a PDF file (144K). This article was originally published in the September/October 2009 issue of Intercom.

The technical side of XML is not much more difficult than HTML; if you can handle a few HTML angle brackets you can learn XML. […] If […] you don’t like using styles and
prefer to format everything as you go, you are going to loathe structured authoring. 

Just trying to make sure that there are no surprises. The article itself is a very basic introduction to the principles that make XML important for technical communication: automation, baseline architecture (sorry…I had problems with B), and consistency.

Read More

A strident defense of mediocre formatting

In addition to a gratuitous (and entertaining) swipe at “noisome” DITA “fanboys,” Roger Hart argues that we need to reconsider the disadvantages of automated formatting:

The thing is, [separation of content and formatting has] all been taken rather stridently to heart in certain quarters, leading to a knee jerk reaction whenever author-controlled formatting/pagination/lineation is mentioned as anything other than bleak, sulphurous devilry. This is twaddle. […]

Uncertainty in meaning is anathema to user intelligibility. If we’re going to make sure we’re not writing poetry, there’s definitely value in having poetry’s level of control over semantic blocks.

Of course, it’s fully possible that this is an expensive distraction.

Possible? It’s definitely expensive. It’s possible that it’s a distraction.

I think Hart perhaps unintentionally put his finger on the real issue: value. How much value (in the form of improved comprehension) is added to a technical document when you are able, in the words of commenter Brian Harris, to “lovingly handcraft” each page?

How much value (in the form of cost avoidance) is added to an organization when you are able to spit out a reasonably formatted document in a few minutes?

Actually, I have a different question. How far should we take this argument? Here’s an example of the pinnacle of handcrafting:

Book of Kells image
Can we all agree that this might perhaps take handcrafting a little too far?

Compared to the Book of Kells (above), the Gutenberg Bible looks quite pedestrian:

Gutenberg Bible image

You can just imagine the scribes with their quills, lapis, gold leaf, and other implements muttering, “That Gutenberg and his noisome fanboys. He can’t even render two colors without our help. Poser. It’ll never last.”

Formatting automation removes cost from the process of creating and delivering content. For technical documents that change often and are perhaps delivered in multiple languages, it removes a lot of cost. Let’s assume that handcrafted pages can improve ease of reading and comprehension with careful copy-fitting and adjusted spacing (Hart’s article mentions “headings, line breaks, intra-word, etc”). This increases the cost of the content.

What happens when content is expensive? Fewer people get to see it.

Books in Europe went from 50000 before Gutenberg to 12 million 50 years later.

I think we can all agree that e-books offer none of the typographic sophistication in question here. Bill Gates (yes, that Bill Gates) wrote in 1999:

It is hard to imagine today, but one of the greatest contributions of e-books may eventually be in improving literacy and education in less-developed countries. Today people in poor countries cannot afford to buy books and rarely have access to a library. 

Essentially, we can produce documents inexpensively and give more people access to them as a direct result of lower cost, or we can climb on our typographic high horse and whine about word spacing.

I’m with the noisome fanboys.

Read More

Ignoring DOCTYPE in XSL Transforms using Saxon 9B

Recently I had to write some XSL transforms in which I wanted to ignore the DOCTYPE declarations in the source XML files. In one case, I didn’t have access to the DTD (and the files wouldn’t have validate even if I did). In the other case, the XML files were DITA files, but I had no need or interest in validating the files; I simply needed to run a transform that modified some character data in the files.

In the first case, I ended up writing a couple of SED scripts that removed and re-inserted the DOCTYPE declaration. By the time I encountered the second case, I wanted to do something less ham-fisted, so I started investigating how to direct Saxon to ignore the DOCTYPE declaration.

My first thought was to use the -x switch in Saxon. Perhaps I didn’t use it correctly, but I couldn’t get it to work. Even though I was using a non-validating parser (Piccolo), Saxon kept telling me that the DTD couldn’t be found.

I went back to the drawing board (aka Google) and found a note from Michael Kay that said, “to ignore the DTD completely, you need to use a catalog that redirects the DTD reference to some dummy DTD.” Michael provided a link to a very useful page in the Saxon Wiki that discussed using a catalog with Saxon. After a bit of experimentation, I got it working correctly. In this blog post, I’ve distilled the information to make it useful to others who need to ignore the DOCTYPE in their XSL.

Before I describe the catalog implementation, I’d like to point out a simple solution. This solution works best when a set of XML files are in a single directory and all files use the same DOCTYPE declaration in which the system ID specifies a file:

<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">

In this case, you don’t need a catalog. It’s easier to create an empty file named “topic.dtd” (a dummy DTD) and save it in the same directory as the XML files. The XML parser looks first for the system ID; if it finds a DTD file, it uses it. Case closed.

However, there are many cases in which this simple solution doesn’t work. The system ID (“topic.dtd” in the previous example) might specify a path that cannot be reproduced on your machine…or the XML files could be spread across multiple directories…or there could be many different DOCTYPEs…or…

In these cases, it makes more sense to set up a catalog file. To specify a catalog with Saxon, you must use the XML Commons Resolver from Apache (resolver.jar). You can download the resolver from SourceForge. The good thing is, if you have the DITA Open Toolkit installed on your machine, you already have a copy of the resolver.jar file. The file is in %DITA-OT%libresolver.jar. You specify the class path for the resolver in the Java command using the -cp switch (shown below).

The resolver requires you to specify a catalog.xml file, in which you map the the public ID (or system ID) in the DOCTYPE declaration to a local DTD file. The catalog.xml file I created looks like this:

<catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="-//OASIS//DTD DITA Topic//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Concept//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Task//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Reference//EN" uri="dummy.dtd"/>

Note that the uri attribute in each entry points to a dummy DTD (an empty file). The file path used for the dummy.dtd file is relative to the location of the catalog file.

Putting it all together, I created a DOS batch file to run Java and invoke Saxon:

java -cp c:saxon9saxon9.jar;C:DITA-OT1.4.3libresolver.jar ˆ
-Dxml.catalog.files=catalog.xml ˆ
net.sf.saxon.Transformˆ ˆ ˆ ˆ
-xsl:my_transform.xsl ˆ

The Java -cp switch adds class paths for the saxon.jar and resolver.jar files. The -D switch sets the system property xml.catalog.files to the location of the catalog.xml file.

The switches following the Java class (net.sf.saxon.Transform) are Saxon switches.

  • -r – class of the resolver
  • -x – class of the source file parser
  • -y – class of the stylesheet parser

Note, I’m using Windows (DOS) syntax here. If you are using Unix (Linux, Mac), separate the paths in the class path with a colon (:) and use the backslash () as a line continuation character.

When you run Saxon this way, you’ll notice two things: first, Saxon doesn’t complain about the DTD (yay!), but secondly, there is no DOCTYPE declaration in the output. I’ll address how to add the DOCTYPE declaration back to the output XML file in my next blog post.

Read More

Learn DITA and XML at your desk

For August and September, our webinar schedule is as follows:
DITA 101, August 18 at 11 a.m. Eastern time
Participants will learn about basic Darwin Information Typing Architecture (DITA) concepts, the business case for implementing DITA, and some typical uses of DITA. This webinar is ideal for those who are considering a move to structured authoring based on the DITA standard. Register
Demystifying DITA to PDF Publishing, September 10 at 11 a.m. Eastern time
When a company implements a DITA-based workflow, the most difficult technical obstacle is often setting up a PDF/print publishing workflow. This session discusses the advantages and disadvantages of using the DITA Open Toolkit, FrameMaker, InDesign, and other options to create PDF output from DITA content. Basic familiarity with DITA, Extensible Markup Language (XML), and related technologies is helpful but not required. Register
What Do Movable Type and XML Have in Common?, September 22 at 11 a.m. Eastern time
The invention of movable type changed the economics of information by making the process of copying a book by hand obsolete. More than 500 years later, XML seems to be doing the same to desktop publishing. But where movable type changed the economics of a mechanical process—creating printed 
copies—XML changes the economics of content authoring, formatting, and customization. This webinar takes a look at how publishing technologies revolutionize the way people consume information and how those technologies affect authors. Register
Each webinar is $20. 
During the sessions, you can interact with the presenter and other students through the chat interface or the audio connection. There is a question-and-answer session at the end of each webinar. The Q&A is not included in session recordings, which are available for download later. Participants in the sessions receive a free recording.
To register for these webcasts, or to purchase recordings of past webinars, go to our online store.

Read More

Top five reasons to like XMetal and OXygen

by Sheila Loring

Full disclosure: We’re an XMetaL Services Provider and have no particular affiliation with oXygen.

I’m in the fortunate situation of having access to both XMetaL 5.5 and oXygen 9.3. Both are excellent XML editors for different reasons. I’d hate for Scriptorium to make me choose one over the other.

From the viewpoint of authoring XML and XSLT, here are my top five features of both editors:


  • Apply XSLT on the fly: You can associate an XML file with an XSLT and transform the XML within oXygen. Goodbye, command line! XMetaL will convert the document to a selected output format. You don’t choose the XSLT–it hasn’t been a big concern for me.
  • Indented code: The pretty-print option makes working with code so easy. You can set oXygen to do this automatically when you open a file or on demand. The result is code indented according to the structure. XMetaL doesn’t have pretty print.
  • Autocompleting tags: As you type an element, oXygen pops up a list of elements beginning with the typed string. You press Enter when you find the right tag, and the end tag is inserted for you. The valid attributes at any particular point are also shown in a drop-down list. XMetaL doesn’t have autocompleting tags.
  • Find/replace in one or more documents: I’ve often needed to search and replace strings in an entire directory. In XMetaL, you can only find and replace in the current document.
  • Comparing two documents or directories: Compare files by content or timestamp. In a directory, you can even filter by type so only XML files, for example, are compared. XMetaL doesn’t offer this feature.


  • Auto-tagging content: You can copy and paste content from an unstructured document (a web page, for example), and XMetaL automatically wraps the content in elements. Even tables and lists are wrapped correctly. This can be handy if you have a few documents to convert. In oXygen, the content is pasted as plain text.
  • Auto-assignment of ID attributes: Never worry about coming up with unique IDs. XMetaL will assign them to the types of elements you select. Warning: The strings are quite long, as in “topic_BBEC2A36C97A4CADB130784380036FD6.” oXygen only inserts IDs on the top-level element but full support will be added in version 10.3.
  • Auto-insertion of basic elements: When you create a document, XMetaL inserts placeholders for elements such as title, shordesc, body, and p. It’s a small convenience. oXygen will also insert elements if you have Content Completion selected in the Preferences.
  • WYSIWYG view of tables: The table is displayed as you’d see it in a Word or FrameMaker document. In oXygen, all you see are the table element tags.
  • Reader-friendly tag view: The tags are a bit easier to read in XMetaL than oXygen. In XMetaL, the opening and closing tags are displayed on one line when possible. This feature saves space on the page and makes the document easier to read in tag view. For example, you might have a short sentence wrapped in p tags. In XMetal, the p tags are displayed on the same line. In oXygen, the p tags are always on separate lines. This is another convenience that doesn’t sound like a big deal, but it really makes a difference while you’re authoring.

oXygen and XMetal have so many other strengths. I’ve just chosen my top five features.

What I’d like to see in XMetaL: The ability to indent code, the ability to drag and drop topics in the map editor.
What’s I’d like to see in oXygen: The ability to view a table–lines and all–in the WYSIWYG view instead of just the element tags.

So how do I choose which editor to use at a particular moment? When I’m casually authoring in XML, I choose XMetaL for all of reasons you read above. The WYSIWYG view is more user-friendly to me. But when I’m writing XSLT or just want to get at the code of an XML document, oXygen is my choice.

Get the scoop on oXygen from Read more about XMetaL at

Update 6/15/09:
I’m thrilled to report that two deficiencies I reported in oXygen 9 are now supported in the latest version of oXygen — 10.2.

  • In Author view, tables are now displayed in WYSIWYG format. Just like in your favorite word processor, you can drag and drop column rulings to resize columns. After you resize columns, the colwidth attribute in the colspec element is updated automatically. This is much easier than manually editing the colwidth.
  • In Author view, the tags are now displayed on one line when possible. Before, the tags were always on separate lines from the content.

Two more reasons to love oXygen!

Read More

Death to Recipes!

I love food. I enjoy cooking and I especially enjoy eating. One of my favorite web sites is, and the kitchen shelf devoted to cookbooks sags alarmingly. Many Saturday mornings, you will find me here.

But I am not happy about how recipes have insinuated themselves into my work life. For some reason, the recipe is the default example of structured content. Look at what happens when you search Google for xml recipe example. Recipes are everywhere, not unlike high fructose corn syrup. Unfortunately, I am not immune to the XML recipe infiltration myself.

I understand the appeal. Recipes are:

  • highly structured content
  • well understood

But I think the example is getting a little tired and wilted. Let’s try working with something new. Try out a new kind of lettuce, er, example. This week, I’m trying to write a very basic introduction to structured authoring, and I’m paralyzed by my inability to think of any non-recipe examples.

I’m considering using a glossary as an example. After all, it’s a highly structure piece of content whose organization is well understood. Maybe I’ll use food items as my glossary entries. Baby steps…

PS It’s totally unrelated, but this article about two chefs eating their way through Durham (“nine restaurants in one night, at least five hours of eating and drinking”) is quite fun.

Read More

Essential tools of an XML workflow in the publishing industry

by Sheila Loring

Communications from DMN provided a link to a webcast on Essential Tools of an XML Workflow. The webcast focuses on the book publishing industry. It’s interesting to hear that some publishing houses still allow authors and editors to use Microsoft Word. These folks are often viewed as incapable of learning an XML authoring tool. Many times the Word content is sent to an indexer for tagging.

The companies I’ve worked with don’t give their employees the choice of publishing tools, but if you’re Stephen King, you probably won’t be forced to use an XML tool.

Technical writers, if you know how to work with XML, your skills are portable to publishing houses. Don’t overlook this in a job search.

Read More

WritersUA: DITA pilot techniques

Mark Wallis of IBM ISS on how to run a successful DITA pilot. Some great information in this presentation on how to reduce risks.

He recommends selecting your pilot project based on the following items:

  • Right timeframe — don’t choose the project that has an imminent release
  • Choose a manageable documentation set size
  • Reduce risk by avoiding the strongest (or most critical) product
  • Identify a product with a known need to improve the user experience

They had one person out of a group of twelve, a “senior in name only” writer, leave because of this transition.

The ideal team for a pilot will need cross-functional and complementary skills:

  • Project management skills
  • Tools and technology strengths
  • Product knowledge and understanding
  • Architecture and design skills
  • Editor for standards and styles

Some advice on planning your content. (And it’s worth noting here that these apply to good writing and topic-oriented content rather than to DITA tools.)

  • No autopilot writing
  • Don’t just migrate existing content; you’ll get trapped in old paradigms (this assumes that existing content does not fit the DITA topic paradigm)
  • Perform use case analysis and task analysis
  • Determine the critical scenarios to document
  • Focus on tasks; backfill supporting information as needed

Some interesting discussion of “task support clusters,” which include conceptual overviews, related tasks, deep concept, and reference information. (Michael Hughes did a presentation on this earlier today, which I unfortunately was not able to attend.)

They set up a DITA War Room in a small conference room and met at least daily (1.5 to 2 hours per day. Yikes). They set weekly goals and used small tasks to build momentum.

There was also heavy use of an internal wiki to put up initial “straw man” design, then revise, comment, and discuss.

Layering deliverables
Implementation deliverables were split out into smaller tasks, such as:

  • Creating topic files, links, and navigation
  • Testing links from code and navigation
  • Creating task and reference topics
  • Validating help against the user interface
  • Creating concept topics for principles, guidelines, and best practices (“deep concept”)
  • Validating content in the expert community

For the third time, he points out that they are no longer documenting how to use a check box, so I guess I’ll mention it.

Choosing the DITA toolset

Task Modeler (free) for building and managing ditamaps, defining relationships between topics, and creating skeleton topics (stub files).

DITA-compliant editor to edit your topics.

Compiler (part of open source toolkit). Compiler? What are they compiling? HTML Help? Oh. He just referred to Ant as a compiler. Ohhhhhkay.

Proof of concept

They picked a subset of the pilot to do the proof of concept.

The presenter’s boss is quoted as saying, “There’s no such thing as bad weather, only insufficient clothing.” I’m guessing that she’s never been to Minnesota in winter.

The objectives for the proof of concept:

  • Learn and evaluate tools
  • Address technical obstacles
  • Specify end-to-end requirements

They learned that deliverable formats matter because they must deliver several different formats.

Managing costs

Purchase toolsets only for pilot team.

After completing proof of concept (successfully!), invest in tools for the remaining writers.


They used their wiki to capture conventions and guidelines.

Improving acceptance

They paid attention to the change management issues. He doesn’t mention it here, but I would assume that the combination of an acquisition by IBM plus the requirement to change the authoring environment could have caused significant angst. Their approach included presentations, wiki content, email discussions, and online training.

At the point of transition, DITA boot camp was offered.

They used collaborative walkthroughs, or reviews, to help standardize their content development. Interesting. This sounds as though it could be a) threatening and b) an unbelievable time sink. But just maybe it might also c) help improve the content.

Other lessons learned

Think more, write less. (Don’t document the obvious, don’t document common user interface convention, write only if you’re really adding value.)

Don’t squander your ignorance. (If something makes you stumble in the interface, that will probably also cause problems for your users, so capture it.)

The more structured your content, the easier the transition to DITA.

Documenting the obvious teaches readers to ignore your text, so don’t document the obvious.

The handouts are available here:

Read More