The state of structure in technical communication
In early 2009, Scriptorium Publishing conducted a survey to measure how and why technical communicators are adopting structured authoring.
In early 2009, Scriptorium Publishing conducted a survey to measure how and why technical communicators are adopting structured authoring.
Want to get me riled up? You can easily achieve that by making me feel stupid while reading your blog.
I read a lot of blogs about technology, and I’ll admit that I’m on the periphery of some of these blogs’ intended audiences. That being said, there is no excuse for starting a blog entry like this:
Everyone’s heard of Gordon Moore’s famous Law, but not everyone has noticed how what he said has gradually morphed into a marketing message so misleading I’m surprised Intel doesn’t require people to play the company jingle just for mentioning it.
Well, I must not be part of the “everyone” collective because I had to think long and hard about “Gordon Moore’s famous law,” and I drew a blank. (Here’s a link for those like me who can’t remember or don’t know what Moore’s Law is.)
Making a broad generalization such as “everyone knows x” is always dangerous. This is true in blog writing as well as in technical writing. In our style guide, we have a rule that writers should “not use simple or simply to describe a feature or step.” By labeling something as simple, it’s guaranteed you will offend someone who doesn’t understand the concept. For example, while editing one of our white papers about the DITA Open Toolkit, I saw the word “simply” and took great delight in striking through it. From where I stand, there is little that is “simple” about the toolkit, particularly when you’re talking about configuring output.
Don’t get me wrong: I’m not saying that a blog entry, white paper, or user manual about very technical subjects has to explain every little thing. You need to address the audience at the correct level, which can be a delicate balancing act with highly technical content: overexplaining can also offend the reader. For example, in a user manual, it’s probably wise to explain up front the prerequisite knowledge. Also consider offering resources where the reader can get that knowledge: that way, you get around explaining concepts but still offer assistance to readers who need it.
In the case of online content and blog entries, you can link to definitions of terms and concepts: readers who need help can click the links to get a quick refresher course on the information, and those who don’t can keep on reading. The author of the blog post in question could have inserted a link to Moore’s Law. Granted, he does define the law in the second paragraph, but he lost me with the “everyone has heard” bit at the start.
That “everyone has heard” assumption brings me back to my primary point: don’t make your readers feel stupid, particularly by making sweeping assumptions about what “everyone” knows or by labeling something as “simple.” Insulted readers move on—and may never come back.
At Scriptorium earlier this week, we all watched live blogs of the iPad announcement. (What else would you expect from a bunch of techies?) One feature of the iPad that really got us talking (and thinking) is its support of the ePub open standard for ebooks.
ePub is basically a collection of XHTML files zipped up with some baggage files. Considering a lot of technical documentation groups create HTML output as a deliverable, it’s likely not a huge step further to create an ePub version of the content. There is a transform for DocBook to ePub; there is a similar effort underway for DITA. You can also save InDesign files to ePub.
While the paths to creating an ePub version seem pretty straightforward, does it make sense to release technical content as an ebook? I think a lot of the same reasons for releasing online content apply (less tree death, no printing costs, and interactivity, in particular), but there are other issues to consider, too: audience, how quickly ebook readers and software become widespread, how the features and benefits of the format stack up against those of PDF files and browser-based help, and so on. And there’s also the issue of actually leveraging the features of an output instead of merely doing the minimum of releasing text and images in that format. (In the PDF version of a user manual, have you ever clicked an entry in the table of contents only to discover the TOC has no links? When that happens, I assume the company that released the content was more interested in using the format to offload the printing costs on to me and less interested in using PDF as a way to make my life easier.)
The technology supporting ebooks will continue to evolve, and there likely will be a battle to see which ebook file format(s) will reign supreme. (I suspect Apple’s choice of the ePub format will raise that format’s prospects.) While the file formats get shaken out and ebooks continue to emerge as a way to disseminate content, technical communicators would be wise to determine how the format could fit into their strategies for getting information to end users.
What considerations come to your mind when evaluating the possibility of releasing your content in ePub (or other ebook) format?
When I first started importing DITA and other XML files into structured FrameMaker, I was surprised by the excessive whitespace that appeared in the files. Even more surprising (in FrameMaker 8.0) were the red comments displayed via the EDD that said that some whitespace was invalid (these no longer appear in FrameMaker 9).
The whitespace was visible because of an odd decision by Adobe to handle all XML whitespace as if it were significant. (XML divides the world into significant and insignificant whitespace; most XML tools treat whitespace as insignficant except where necessary…think <codeblock> elements). This approach to whitespace exists in both FrameMaker and InDesign.
At first I handled the whitespace on a case-by-case basis, removing it by hand or through regular expressions. Eventually, I realized this was a more serious problem and created an XSL transform to eliminate the white space as a part of preprocessing. By using XSL that was acceptable to Xalan (not that hard), the transform can be integrated into a FrameMaker structured application.
I figured this whitespace problem must be affecting (and frustrating) more than a few of you out there,
so I made the stylesheet available on the Scriptorium web site. I also wrote a white paper “Removing XML whitespace in structured FrameMaker documents” that describes describes the XSL that went into the stylesheet and how to integrate it with your FrameMaker structured applications.
The white paper is available on the Scriptorium web site. Information about how to download the stylesheet is in the white paper.
If the stylesheet and whitepaper are useful to you, let us know!
Lately, our webcasts are getting great participation. The December event had 100 people in attendance (the registered number was even higher), and the numbers for the next few months are strong, as well. Previous webcasts had attendance of A Lot Less than 100. What changed? The webcasts are now free. (Missing an event? Check our archives.)
We’re going in a similar direction with white papers. We charge for some content, but we also offer a ton of free information.
The idea is that free (and high-quality) information raises our profile and therefore later brings in new projects. I’m not so sure, though, that we have any evidence that supports this theory yet.
So, I thought I’d ask my readers. Do you evaluate potential vendors based on offerings such as webcasts and white papers? Are there other, more important factors?
PS Upcoming events, including several DITA webcasts, are listed on our events page.
It’s time for my (apparently biennial) predictions post. For those of you keeping score at home, you can see the last round of predictions here. Executive summary: no clear leader for DITA editing, reuse analyzers, Web 2.0 integration, global business, Flash. In retrospect, I didn’t exactly stick my neck out on any of those. Let’s see if I can do better this year.
Everyone else is talking about the cloud, but what about tech comm? Many content creation efforts will shift into the cloud and away from desktop applications and their monstrous footprints (I’m looking at you, Adobe). When your content lives in the cloud, you can edit from anywhere and be much less dependent on a specific computer loaded with specific applications.
I expect to see much more content creation migrate into web applications, such as wiki software and blogging software. I do not, at this point, see much potential for the various “online word processors,” such as Buzzword or Zoho Writer, for tech comm. Creating documents longer than four or five pages in these environments is painful.
In the ideal universe, I’d like to see more support for DITA and/or XML in these tools, but I’m not holding my breath for this in 2010.
From what we are seeing, the rate of XML adoption is steady or even accelerating. But the rationale for XML is shifting. In the past, the benefits of structured authoring—consistency, template enforcement, and content reuse—have been the primary drivers. But in several newer projects, XML is a means to an end rather than a goal—our customers want to extract information from databases, or transfer information between two otherwise incompatible applications. The project justifications reach beyond the issues of content quality and instead focus on integrating content from multiple information sources.
Is the hype about social media overblown? Actually, I don’t think so. I did a webcast (YouTube link) on this topic in December 2009. The short version: Technical communicators must now compete with information being generated by the user community. This requires greater transparency and better content.
My prediction is that a strategy for integrating social media and official tech comm will be critical in 2010 and beyond.
The days of the hermit tech writer are numbered. Close collaboration with product experts, the user community, and others will become the norm. This requires tools that are accessible to non-specialists and that offer easy ways to manage input from collaborators.
There are a couple of interesting changes in language:
What are your predictions for 2010?
Other interesting prediction posts:
Formatting Object (FO) processors (FOP, in particular) often fail with memory errors when processing very large documents for PDF output. Typically in XSL:FO, the body of a document is contained in a single fo:page-sequence element. When FO documents are converted to PDF output, the FO processor holds an entire fo:page-sequence in memory to perform pagination adjustments over the span of the sequence. Very large page counts can result in memory overflows or Java heap space errors.
In early 2009, Scriptorium Publishing conducted a survey to measure how and why technical communicators are adopting structured authoring.
Of the 616 responses:
This report summarizes our findings on topics including the reasons for implementing structure, the adoption rate for DITA and other standards, and the selection of authoring tools.
Download
(2 MB, 56 pages)
Discuss this document in our forum
In a posting a few weeks ago I discussed how to ignore the DOCTYPE declaration when processing XML through XSL. What I left unaddressed was how to add the DOCTYPE declaration back to the files. Several people have told me they’re tired of waiting for the other shoe to drop, so here’s how to add a DOCTYPE declaration.
First off: the easy solution. If the documents you are transforming always use the same DOCTYPE, you can use the doctype-public and doctype-system attributes in the <xsl:output> directive. When you specify these attributes, XSL inserts the DOCTYPE automatically.
However, if the DOCTYPE varies from file to file, you’ll have to insert the DOCTYPE declaration from your XSL stylesheet. In DITA files (and in many other XML architectures), the DOCTYPE is directly related to the root element of the document being processed. This means you can detect the name of the root element and use standard XSL to insert a new DOCTYPE declaration.
Before you charge ahead and drop a DOCTYPE declaration into your files, understand that the DOCTYPE declaration is not valid XML. If you try to emit it literally, your XSL processor will complain. Instead, you’ll have to:
There are at least two possible approaches for adding DOCTYPE to your documents: use an <xsl:choose> statement to select a DOCTYPE, or construct the DOCTYPE using the XSL concat() function.
To insert the DOCTYPE declaration with an <xsl:choose> statement, use the document’s root element to select which DOCTYPE declaration to insert. Note that the entities “>” and “<” aren’t HTML errors in this post, they are what you need to use. Also note that the DOCTYPE statement text in this template is left-aligned so that the output DOCTYPE declarations will be left aligned. Most parsers seem to tolerate whitespace before the DOCTYPE declaration, but I prefer to err on the side of caution:
<xsl:template match="/">
<xsl:choose>
<xsl:when test="name(node()[1]) = 'topic'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'concept'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'task'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd">
</xsl:text>
</xsl:when>
<xsl:when test="name(node()[1]) = 'reference'">
<xsl:text disable-output-escaping="yes">
<!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "reference.dtd">
</xsl:text>
</xsl:when>
</xsl:choose>
<xsl:apply-templates select="node()"/>
</xsl:template>
The preceding example contains statements for the topic, concept, task, and reference topic types; if you use other topic types, you’ll need to add additional statements. Rather than write a statement for each DOCTYPE, a more general approach is to process the name of the root element and construct the DOCTYPE declaration using the XSL concat() function.
<xsl:variable name="ALPHA_UC" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="ALPHA_LC" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="NEWLINE" select="'&#x0A;'"/>
<xsl:template match="/">
<xsl:call-template name="add-doctype">
<xsl:with-param name="root" select="name(node()[1])"/>
</xsl:call-template>
<xsl:apply-templates select="node()"/>
</xsl:template>
<span style="color: green;"><-- Create a doctype based on the root element --></span>
<xsl:template name="add-doctype">
<xsl:param name="root"/>
<span style="color: green;"><-- Create an init-cap version of the root element name. --></span>
<xsl:variable name="initcap_root">
<xsl:value-of
select="concat(translate(substring($root,1,1),$ALPHA_LC,$ALPHA_UC),
translate(substring($root,2 ),$ALPHA_UC,$ALPHA_LC))"
/>
</xsl:variable>
<span style="color: green;"><-- Build the DOCTYPE by concatenating pieces.</span>
<span style="color: green;">Note that XSL syntax requires you to use the &quot; entities for</span>
<span style="color: green;">quotation marks ("). --></span>
<xsl:variable name="doctype"
select="concat('!DOCTYPE ',
$root,
' PUBLIC &quot;-//OASIS//DTD DITA ',
$initcap_root,
'//EN&quot; &quot;',
$root,
'.dtd&quot;') "/>
<xsl:value-of select="$NEWLINE"/>
<span style="color: green;"><-- Output the DOCTYPE surrounded by < and >. --></span>
<xsl:text disable-output-escaping="yes"><
<xsl:value-of select="$doctype"/>
<xsl:text disable-output-escaping="yes">>
<xsl:value-of select="$NEWLINE"/>
</xsl:template>
The one caveat about this approach is that it depends on a consistent portion of the public ID form (“-//OASIS//DTD DITA “). If there are differences in the public ID for your various DOCTYPE declarations, those differences may complicate the template.
So there you have it: DOCTYPEs in a flash. Just remember to use disable-output-escaping=”yes” and use entities where appropriate and you’ll be fine.
As technical communicators, our ultimate goal is to create accessible content that helps users solve problems. Focusing on developing quality content is the priority, but you can take that viewpoint to an extreme by saying that content-creation tools are just a convenience for technical writers:
The tools we use in our wacky profession are a convenience for us, as are the techniques we use. Users don’t care if we use FrameMaker, AuthorIt, Flare, Word, AsciiDoc, OpenOffice.org Writer, DITA or DocBook to create the content. They don’t give a hoot if the content is single sourced or topic based.
Sure, end users probably don’t know or care about the tools used to develop content. However, users do have eagle eyes for spotting inconsistencies in content, and they will call you out for conflicting information in a heartbeat (or worse, just abandon the official user docs altogether for being “unreliable”). If your department has implemented reuse and single-sourcing techniques that eliminate those inconsistencies, your end users are going to have a lot more faith in the validity of the content you provide.
Also, a structured authoring process that removes the burden of formatting content from the authoring process gives tech writers more time to focus on providing quality content to the end user. Yep, the end user doesn’t give a fig that the PDF or HTML file they are reading was generated from DITA-based content, but because the tech writers creating that content focused on just writing instead of writing, formatting, and converting the content, the information is probably better written and more useful.
All this talk about tools makes me think about the implements I use for gardening. A few years ago, I planted a young dogwood tree in my back yard. I could have used a small gardening trowel to dig the hole, but instead, I chose a standard-size shovel. Even though the tree had no opinion on the tool I used (at least I don’t think it did!), it certainly benefited from my tool selection. Because I was able to dig the hole and plant the tree in a shorter amount of time, the tree was able to develop a new root system in its new home more quickly. Today, that tree is flourishing and is about four feet taller than it was when I planted it.
The same applies to technical content. If a tool or process improves the consistency of content, gives authors more time to focus on the content, and shortens the time it takes to distribute that content, then the choice and application of a tool are much more than mere “conveniences.”