To understand how XML changes technical communication, we need to step back and look at how the rise of information technology has changed the content development process. Through the 1970s, most technical communication work had separate writing, layout, and production phases. Authors wrote content, typically in longhand or on typewriters. Typesetters would then rekey the information to transfer it into the publishing system. The dedicated typesetting system would produce camera-ready copy, which was then mechanically reproduced on a printing press.
In a desktop publishing environment, authors could type information directly into a page layout program and set up the document design. This eliminated the inefficient process of re-entering information, and it often shifted the responsibility for document design to technical communicators.
XML is now becoming an accepted (or even expected) foundation for technical content. In an XML-based environment, authors must follow the required structure, and formatting is applied to the content automatically during the publishing process. This takes away the author’s direct control over formatting; instead, the author encodes semantic information via XML elements. A <note> element can be rendered differently in different outputs—perhaps the print output uses a shaded background and the HTML output has no shading but includes lines above and below the note content.
The shift toward automation
Componentization and automation have transformed many manufacturing processes, and this shift is now taking place in technical communication. In an artisanal process, you craft products individually, and each product is slightly different (for example, hand-thrown pottery). A manufacturing process replaces this approach with smaller, discrete, and repeatable tasks. The result is more consistent, less expensive, and has less artistic value. In technical communication, content creators are shifting from creating books to creating units of content, usually topics. Each content creator’s topics must be compatible with topics that others are creating, so that the topics can later be combined into a single, unified deliverable.
For individual contributors or small workgroups, the componentized approach is not particularly compelling—the cost savings are minimal. But as you add more content creators, the efficiency improvements from a component-based approach start to add up to significant decreases in production time and cost. The final small tweaks to pagination and line breaks that many authors made in a final production edit are eliminated. There is an ongoing debate over whether XML-based publishing lowers the quality of the finished product. It’s my opinion that a traditional desktop publishing process, when perfectly executed, results in the highest possible document quality. An automated workflow can achieve perhaps 90 percent of the quality of a perfect manual workflow. If that last 10 percent is critical to your customers, you need to stay with a workflow that includes a final production edit. For the vast majority of technical publishing projects, however, the automated XML approach will result in acceptable quality.
NOTE: Many people look at the default PDF output from the DITA Open Toolkit and dismiss the toolkit as unacceptable. This is like reviewing a publishing tool’s default templates and deciding that the tool is unusable because the templates don’t meet your needs. The DITA Open Toolkit is a great starting point for generating HTML, PDF, and other output formats, but extensive configuration is required to produce attractive results.
But how much value is added by crafting pages? Do readers notice copyfitting niceties, balanced headings, and improved line breaks? Moving to automation will, especially in the short term, result in formatting compromises, but there’s a difference between giving up on fine-tuning line breaks and having utterly unacceptable formatting—like a heading at the bottom of a page.
Changes in management of technical communication
When you transition a department or workgroup to XML-based authoring, you will need to change how you manage that department, particularly in the following areas:
- Transparency and metrics
- Skill sets
The transition from an artisanal, unique process to a more predictable manufacturing process introduces more accountability into the content creation process for writers because it’s easier to measure content quality. Storing content in XML opens up content for assessment across several dimensions, including writing, formatting, and search performance.
To evaluate writing quality, you need to measure whether content is clear, concise, and audience-appropriate. Those factors are not controlled by the underlying authoring tools. An XML-based authoring system can, however, help to improve two other factors: accuracy and timeliness. XML, structured authoring, and content reuse reduce content duplication and therefore improve accuracy by eliminating problems with copying and pasting or with maintenance omissions. Content creators must still write accurate content, but at least they do not have to worry about propagating that content across multiple authoring tools or copying and pasting it into several different locations.
XML can also help to ensure that content is current. An XML-based, automated publishing process allows for faster publishing or republishing. This, in turn, enables you to publish more content more frequently, resulting in more up-to-date information for your audience.
The general assumption is that XML results in ugly content, especially ugly printed/PDF content. In fact, there’s no technical reason that requires XML-based publishing to be ugly. But the process of configuring an XML implementation to produce attractive content is challenging. As a result, many people fall back on the default formatting (for example, the default PDF output from the DITA Open Toolkit). That formatting is definitely unattractive.
XML-based formatting has some similarities with a template-driven authoring environment. Once you define the formatting requirements and build the system to support them, you will get formatting that is exactly what you specify. Furthermore, in an XML-driven environment, you do not have to rely on authors to apply proper styles or to remember to change the master pages. The entire formatting process is automated. As a result, a well-designed, well-executed formatting specification will produce attractive output that consistently and correctly delivers the specified formatting. The implication is that an author who tags content correctly will get the correct formatting results.
Automated formatting does have limitations. Typically, you cannot do the following without manual intervention:
- Mixed column layouts on a single page
- Portrait and landscape text on a single page
- Formatting components with arbitrary positioning, such as sidebars or pull quotes
High-end magazines may require these types of sophisticated layouts, but they are rare in technical documentation.
Search performance has three different facets:
XML affects the latter two.
Searchable content is information that is available to a search engine. So, searchability is unrelated to the authoring process—whether content is searchable is a policy decision. Searchability is a prerequisite for findability and discoverability.
Findable content is information that shows up in search results. In addition to writing high-quality content, making content findable involves paying attention to search engine optimization. Some of this work is outside the purview of XML; for example, informative titles, such as “Overview of XYZ Widget Configuration,” perform better than generic titles, such as “Overview.” You can affect findability as part of your XML authoring process, however, by requiring authors to provide metadata, such as keywords for each topic. Another factor that affects search engine rankings is whether content is updated regularly (“fresh”) or not (“stale”). The automated publishing process in an XML approach makes it easier to update content and therefore improves findability.
Discoverable content is information that a reader locates by following links from other content. You can affect discoverability by building useful navigational links in your content. The most important factor in discoverability, though, is reputation. If readers find your content useful, they will link to it, and those links make your content more discoverable.
If content is posted on a web server, you can use web analytics tools to evaluate findability and discoverability of your content. This, in turn, gives you the ability to identify high-value topics and, by implication, the authors who are creating the most popular information.
Transparency and metrics
Working in an XML-based environment makes it easier to get insights into the content
development process for several reasons:
- Text-based file format. It’s much easier to analyze text files than binary (and usually proprietary) file formats.
- Parsing options. With XML, you can use XSL and other programming tools to analyze the content.
- Metrics from within a content management system (CMS). If you put your XML content in a CMS, you can use the CMS’s various tools for content metrics.
With transparency, however, comes the responsibility to use transparency wisely. It’s very easy to calculate a writer’s productivity in hours per page or topics per week. These metrics are at best useless and at worst counterproductive.
Transparency is an issue not just for the content creators, but also for their management.
An XML-based workflow makes the content creation process more measurable. You can assess changes in XML files via basic text tools, such as diff (which measures the changes from one version to another). Another factor is that because authors no longer have ownership of the formatting process, they are accountable only for content creation, so there are fewer tasks to measure.
A few writers have typically spent a lot of time on formatting and on addressing formatting problems to balance their relatively weaker writing skills. For them, the removal of formatting responsibilities is highly problematic.
Over time, managers can use the profusion of available data to begin evaluating the workflow. For example, a manager might be able to see that person A takes twice as long to create a topic as person B. Or perhaps, person A consistently delivers topics for localization that cost more to localize than others in the group. It’s important to go beyond these statistics to understand the root cause. It’s tempting to conclude from these two examples that person B is a better writer, but any of the following factors could also be in play:
- Person A is writing new content; Person B is updating existing content.
- Person A is writing about a more complex product than Person B.
- Person B is working with a cooperative subject matter expert, and Person A has the resident curmudgeon.
As a manager, you must go beyond perceived efficiency and understand what’s really happening.
The obvious, easily measured metrics are generally not very useful. For instance, there’s
a temptation to measure technical writers by the gross output or superficial productivity—pages per day or topics per hour, respectively. These metrics are seductive because they are easy to calculate.
But it’s an axiom of management that people will focus on whatever is measured. If you judge people by page count, they will produce lots and lots of pages. (Many of us succumbed to the “make the font bigger” approach in high school to fill out required pages for writing assignments.) If you measure writers by the number of topics they produce, you can expect to see lots and lots of tiny topics.
Furthermore, this raw measurement of productivity doesn’t measure document quality.
Calculating document quality
Creating a useful measurement system for document quality requires you to go deeper than
just pages per hour. (For software developers, the equivalent sloppy metric is “lines of code per month.”)
We recommend developing a measurement system based on the following ducky components:
Quality: This measures the correct application of grammar, mechanics, style guidelines, consistency, and similar properties. Writing quality is more important for those who speak English as a second language, have lower literacy levels, or are finicky about language (such as English teachers). Writing quality is generally less important for an audience of highly motivated specialists; for example, software developers reading very technical API documentation.
Usability: Writing quality measures the ease of comprehension of the text, graphics, or other content by the intended audience. Usability measures the ease of access to the information. To evaluate usability, you look at factors such as the document navigation system (headers and footers for print; breadcrumbs and the like for online). Did the author employ the proper medium for a particular piece of content? For example, are illustrations provided instead of—or in addition to—lengthy textual explanations? Is the content presented in an attractive, appealing way? Are simulations and video available? High usability is especially important if users can simply choose not to use the product. For example, for a consumer product, such as a cell phone, high usability is important because consumers have lots of options. For products that people must use as part of their job, usability is important to ensure that people can get work done.
Accuracy: Does the content describe the product’s features correctly? This factor is especially important for high-stakes documents, such as the guide for a machine that delivers radioactive isotopes for nuclear medicine. A mistake in casual game-playing instructions is not of much concern.
Completeness: Are all of the product features documented? Game documentation often includes only the bare minimum and allows players to discover features for themselves as they play the game. However, regulated products, such as medical devices, are required to have complete documentation.
Conciseness: Documents should have as much content as required, and no more. Verbose documents are more difficult to understand, and they increase the cost of localization and printing. This principle is closely related to minimalism.
The overall equation is as follows:
Consider your specific environment in refining the calculation for your environment. For
example, you might divide 100 total quality points among the five measurements in
different ways depending on your industry.
|Metric||Regulated documentation||Consumer documentation|
Skill sets for technical communicators
In desktop publishing environments, technical communicators need domain knowledge, writing ability, and tools knowledge in roughly equal parts, with a smattering of people skills.
Typical skill set for a desktop publishing environment
The exact weighting given each skill varies by industry and for specific jobs. For example, domain knowledge is more important for extremely technical products, such as laboratory equipment or software for programmers. Domain knowledge is less important and more easily acquired for consumer products, such as mobile phones or aquarium pumps. The potential danger of using a product incorrectly also affects this requirement—medical devices, nuclear power plants, and machinery tend to require more domain knowledge; whereas relatively innocuous products such as headsets, board games, or consumer electronics require less.
All technical communicators are expected to be competent writers. Different industries, however, emphasize different aspects of writing. For example, some companies use controlled English or Simplified Technical English, so they require writers to limit their word choices, keep sentence structure simple, and follow other rules. For consumer electronics that include the user guide as part of the out-of-box experience (OOBE), technical content may need more flair. For a world-class example of this approach, check out any Apple product.
In most desktop publishing environments, writers are expected to create the content and the final deliverable. To publish content, writers must be able to use the designated publishing tools—Word, FrameMaker, RoboHelp, Flare, Doc2Help, Author-IT, and so on. Writers who lack sufficient tool skills will either be unable to produce content that meets the corporate requirements or will be very inefficient in doing so.
Although people skills are certainly necessary to extract information from subject matter experts or to work successfully in any professional environment, technical writers are often portrayed as introverted, easily offended, and not interested in working with others. (The most famous example of this is, of course, Tina the Tech Writer in the Dilbert comic strip.) In a desktop publishing environment where technical writers are responsible for a single document from start to finish, only minimal collaboration is needed. Each writer can take ownership of a single deliverable (book or help system). This division of labor is encouraged by the software, especially the tools that have been around for a while—it’s quite challenging to share information or to collaborate with multiple writers working on a single deliverable.
In an XML-based environment, the skill set requirements shift:
Tools are much less important for the content creator. The publishing process is automated, and manual production niceties such as copyfitting or adjusting hyphenation usually fall by the wayside. Thus, the content creators must learn the XML tags and their proper use, along with how to employ those tags in their authoring tool. But their responsibility for document formatting disappears, so the amount of expertise required in tools decreases. Setting up the publishing process typically becomes the responsibility of a specialist, and for that person, the tool requirements are greatly increased.
Writing ability becomes more important. XML authoring often implies topic-based authoring and content reuse across the technical communication workgroup. If authors want to combine their topics in a deliverable, they must deliver content that is organized consistently and that follows consistent usage patterns. Writing to accommodate collaboration and reuse is more challenging than writing standalone content.
People skills must improve to facilitate better collaboration. The days of the hermit writer are numbered.
Domain knowledge expands to take up the bandwidth formerly taken up by publishing tools. In the past, a writer could make up for weaker domain knowledge with tools expertise, but now, there’s no chance to hide behind formatting work.
Skills sets in an XML authoring environment