Skip to main content

Tag:

XML

The ABCs of XML

STC Intercom, September/October 2009

XML is rapidly becoming part of the required knowledge for technical communicators. This article discusses the three most important reasons that you should consider XML: automation, baseline architecture, and consistency.

Download the PDF PDF file (144K)

Read More
News

XML 101

My latest XML Strategist article, “The ABCs of XML,” is available as a PDF file (144K). This article was originally published in the September/October 2009 issue of Intercom.

The technical side of XML is not much more difficult than HTML; if you can handle a few HTML angle brackets you can learn XML. […] If […] you don’t like using styles and
prefer to format everything as you go, you are going to loathe structured authoring. 

Just trying to make sure that there are no surprises. The article itself is a very basic introduction to the principles that make XML important for technical communication: automation, baseline architecture (sorry…I had problems with B), and consistency.

Read More
Opinion

A strident defense of mediocre formatting

In addition to a gratuitous (and entertaining) swipe at “noisome” DITA “fanboys,” Roger Hart argues that we need to reconsider the disadvantages of automated formatting:

The thing is, [separation of content and formatting has] all been taken rather stridently to heart in certain quarters, leading to a knee jerk reaction whenever author-controlled formatting/pagination/lineation is mentioned as anything other than bleak, sulphurous devilry. This is twaddle. […]

Uncertainty in meaning is anathema to user intelligibility. If we’re going to make sure we’re not writing poetry, there’s definitely value in having poetry’s level of control over semantic blocks.

Of course, it’s fully possible that this is an expensive distraction.

Possible? It’s definitely expensive. It’s possible that it’s a distraction.

I think Hart perhaps unintentionally put his finger on the real issue: value. How much value (in the form of improved comprehension) is added to a technical document when you are able, in the words of commenter Brian Harris, to “lovingly handcraft” each page?

How much value (in the form of cost avoidance) is added to an organization when you are able to spit out a reasonably formatted document in a few minutes?

Actually, I have a different question. How far should we take this argument? Here’s an example of the pinnacle of handcrafting:

Book of Kells image
Can we all agree that this might perhaps take handcrafting a little too far?

Compared to the Book of Kells (above), the Gutenberg Bible looks quite pedestrian:

Gutenberg Bible image

You can just imagine the scribes with their quills, lapis, gold leaf, and other implements muttering, “That Gutenberg and his noisome fanboys. He can’t even render two colors without our help. Poser. It’ll never last.”

Formatting automation removes cost from the process of creating and delivering content. For technical documents that change often and are perhaps delivered in multiple languages, it removes a lot of cost. Let’s assume that handcrafted pages can improve ease of reading and comprehension with careful copy-fitting and adjusted spacing (Hart’s article mentions “headings, line breaks, intra-word, etc”). This increases the cost of the content.

What happens when content is expensive? Fewer people get to see it.

Books in Europe went from 50000 before Gutenberg to 12 million 50 years later.

I think we can all agree that e-books offer none of the typographic sophistication in question here. Bill Gates (yes, that Bill Gates) wrote in 1999:

It is hard to imagine today, but one of the greatest contributions of e-books may eventually be in improving literacy and education in less-developed countries. Today people in poor countries cannot afford to buy books and rarely have access to a library. 

Essentially, we can produce documents inexpensively and give more people access to them as a direct result of lower cost, or we can climb on our typographic high horse and whine about word spacing.

I’m with the noisome fanboys.

Read More
Tools

Ignoring DOCTYPE in XSL Transforms using Saxon 9B

Recently I had to write some XSL transforms in which I wanted to ignore the DOCTYPE declarations in the source XML files. In one case, I didn’t have access to the DTD (and the files wouldn’t have validate even if I did). In the other case, the XML files were DITA files, but I had no need or interest in validating the files; I simply needed to run a transform that modified some character data in the files.

In the first case, I ended up writing a couple of SED scripts that removed and re-inserted the DOCTYPE declaration. By the time I encountered the second case, I wanted to do something less ham-fisted, so I started investigating how to direct Saxon to ignore the DOCTYPE declaration.

My first thought was to use the -x switch in Saxon. Perhaps I didn’t use it correctly, but I couldn’t get it to work. Even though I was using a non-validating parser (Piccolo), Saxon kept telling me that the DTD couldn’t be found.

I went back to the drawing board (aka Google) and found a note from Michael Kay that said, “to ignore the DTD completely, you need to use a catalog that redirects the DTD reference to some dummy DTD.” Michael provided a link to a very useful page in the Saxon Wiki that discussed using a catalog with Saxon. After a bit of experimentation, I got it working correctly. In this blog post, I’ve distilled the information to make it useful to others who need to ignore the DOCTYPE in their XSL.

Before I describe the catalog implementation, I’d like to point out a simple solution. This solution works best when a set of XML files are in a single directory and all files use the same DOCTYPE declaration in which the system ID specifies a file:

<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">

In this case, you don’t need a catalog. It’s easier to create an empty file named “topic.dtd” (a dummy DTD) and save it in the same directory as the XML files. The XML parser looks first for the system ID; if it finds a DTD file, it uses it. Case closed.

However, there are many cases in which this simple solution doesn’t work. The system ID (“topic.dtd” in the previous example) might specify a path that cannot be reproduced on your machine…or the XML files could be spread across multiple directories…or there could be many different DOCTYPEs…or…

In these cases, it makes more sense to set up a catalog file. To specify a catalog with Saxon, you must use the XML Commons Resolver from Apache (resolver.jar). You can download the resolver from SourceForge. The good thing is, if you have the DITA Open Toolkit installed on your machine, you already have a copy of the resolver.jar file. The file is in %DITA-OT%libresolver.jar. You specify the class path for the resolver in the Java command using the -cp switch (shown below).

The resolver requires you to specify a catalog.xml file, in which you map the the public ID (or system ID) in the DOCTYPE declaration to a local DTD file. The catalog.xml file I created looks like this:

<catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="-//OASIS//DTD DITA Topic//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Concept//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Task//EN" uri="dummy.dtd"/>
<public publicId="-//OASIS//DTD DITA Reference//EN" uri="dummy.dtd"/>
</catalog>

Note that the uri attribute in each entry points to a dummy DTD (an empty file). The file path used for the dummy.dtd file is relative to the location of the catalog file.

Putting it all together, I created a DOS batch file to run Java and invoke Saxon:

java -cp c:saxon9saxon9.jar;C:DITA-OT1.4.3libresolver.jar ˆ
-Dxml.catalog.files=catalog.xml ˆ
net.sf.saxon.Transformˆ
-r:org.apache.xml.resolver.tools.CatalogResolver ˆ
-x:org.apache.xml.resolver.tools.ResolvingXMLReader ˆ
-y:org.apache.xml.resolver.tools.ResolvingXMLReader ˆ
-xsl:my_transform.xsl ˆ
-s:my_content.xml

The Java -cp switch adds class paths for the saxon.jar and resolver.jar files. The -D switch sets the system property xml.catalog.files to the location of the catalog.xml file.

The switches following the Java class (net.sf.saxon.Transform) are Saxon switches.

  • -r – class of the resolver
  • -x – class of the source file parser
  • -y – class of the stylesheet parser

Note, I’m using Windows (DOS) syntax here. If you are using Unix (Linux, Mac), separate the paths in the class path with a colon (:) and use the backslash () as a line continuation character.

When you run Saxon this way, you’ll notice two things: first, Saxon doesn’t complain about the DTD (yay!), but secondly, there is no DOCTYPE declaration in the output. I’ll address how to add the DOCTYPE declaration back to the output XML file in my next blog post.

Read More
Opinion

HTML 5: Browser Wars Reprise?

Recently, I ran across an article by Rob Cherny in Dr. Dobb’s Journal. He suggests that the added features in HTML 5 combined with an end to the development of XHTML point to a brighter standards-based future. He sees closed solutions like Flash, Silverlight, and JavaFX being supplanted directly by HTML 5 code. His view is that the web owes its success to standards.

It’s tempting to agree. Standards certainly allow for collaborative growth. Though I’m not the least bit convinced that collaborative growth is the foundation of the web’s success. I believe that the web’s incredible success is really traceable to the simplicity and flexibility of HTML. Each new version takes us further from that simplicity.

Through the browser war years we saw the impact of new features in HTML—incompatibility among browsers. My sense is that the success of Flash is largely due to the fact that Adobe owns both ends of the problem. They create the tools that generate Flash code as well as the viewer. Web developers can pretty much assume that what they see, when they build a Flash-based solution, is what the end user will see.

I fear that we will head right back to the bad old days if HTML 5’s complex capabilities are widely employed. I suspect that ‘wait and see’ will last a pretty long time. I have other concerns about HTML 5—more on that later. What do you think—will your organization take advantage of these new capabilities as soon as they are available?

Read More
Opinion

Font snobbery? (I don’t think so.)

For its 2010 catalog, IKEA used Verdana font instead of the customized Futura it’s used for years. To say people noticed the switch would be an understatement:

“Ikea, stop the Verdana madness!” pleaded Tokyo’s Oliver Reichenstein on Twitter. “Words can’t describe my disgust,” spat Ben Cristensen of Melbourne. “Horrific,” lamented Christian Hughes in Dublin. The online forum Typophile closed its first post on the subject with the words, “It’s a sad day.” On Aug. 26, Romanian design consultant Marius Ursache started an online petition to get Ikea to change its mind. That night, Verdana was already a trending topic on Twitter, drawing more tweets than even Ted Kennedy.

As a fan of IKEA and its products, I can understand the reaction. If you showed me a page out of an IKEA catalog with just text and prices (and no pictures or funky product names, of course!), I could tell you in a heartbeat that the content was from IKEA.

Verdana may be easier to read if you’re looking at the IKEA catalog online, but that font lacks the designer-y flair of Futura. Because IKEA is known for its affordable cutting-edge design, Verdana just doesn’t seem to quite fit the bill.

This situation reminds me of a comment a friend made about a failed hotel in Raleigh, NC. He said, “Did you see the awful Brush Script on the hotel’s sign? Those people clearly didn’t know how to run a business.” I doubt the Brush Script killed the hotel, but that bad design decision gave my friend (and probably many others) a very unfavorable impression about the company.

Earlier this week, Sarah O’Keefe and I were doing some web research and came upon a web site that used Comic Sans. My reaction to that site was less than positive. I loathe Comic Sans, and I find it hard to take any company seriously that uses a font that emulates text in a comic book.

A company’s use of fonts can become iconic–think about the fonts used by Coca-Cola and FedEx in their logos, for example. Font choice does have an effect on how people perceive content, a product, or a company.

I don’t think reactions to fonts are limited to just those who work in publishing and design. No snobbery here at all. (But if noticing fonts makes me a card-carrying font snob, you better believe that card would have no Comic Sans on it.)

For more about the impact of fonts, check out the documentary Helvetica:

Read More