Scriptorium Publishing

content strategy for technical communication

A hierarchy of content needs

March 4, 2014 by

Some thoughts on how to evaluate content needs as a foundation for content strategy.

In working through the idea of minimum viable content, I decided to build out a hierarchy of content needs based on Maslow’s hierarchy. In Maslow’s pyramid, the basic needs (like food and water) are on the bottom. If you don’t have the basics, you’re unlikely to be interested in the top layer (self-actualization).

What happens if you look at content needs?

Based on Maslow's hierarchy of needs, the layers are from bottom to top: available, accurate, appropriate, connected, and intelligent

Scriptorium hierarchy of content needs

The bottom three layers are what’s required for minimum viable content.

This hierarchy helps us with content strategy work. When your content is not even available in a useful format, focusing on social engagement (the connected layer) is probably premature.


Available content means that information exists, and the person who needs it has access to it. If the content hasn’t been written, but the reader can’t find it, or if it’s behind a firewall/login that the reader doesn’t know about, then content fails the available test.

The first step in meeting content needs is to make the information available to people who need it. You can push information via email, publish to the web or a private site, or send out printed catalogs. But “available” is the first critical need.

In the available category, we also look at whether content is findable, searchable, and discoverable—if readers can’t successfully locate the content they need, it exists, but it’s not really available to the readers.


Content should be accurate. Available but inaccurate isn’t so good. Under this category, we can also evaluate information for grammar, formatting, consistency, and other decorations that improve the content quality.



Appropriate content is delivered in the right language, in the right format, at the right level of complexity. This is where we put together the user’s needs with the delivery possibilities.

Content is “available” when you put it in a crappy PDF and email it on request.

To pass the “appropriate” test, you must deliver that information in a way that is best for the user. Depending on your user, that could mean a mobile-friendly HTML web site or an EPUB file. There’s delivering content, and then there’s delivering content WELL.


The connected layer is where you add user engagement and social layers. This is hard to discuss without using terrible buzzwords, but what I’m looking for here is the ability for readers to comment on your content, vote it up and down, and perhaps edit content or create their own content.

You want to support users in engaging with your content.


The pinnacle is content that isn’t just a static piece of text, but information that can be manipulated for different purposes.

Intelligent content might include content that is personalized, interactive service manuals, the ability to filter information based on my needs, and more.

Often, delivering intelligent content requires you to integrate database content (for example, product specifications) with authored content. Troubleshooting instructions, for example, might be integrated with information from the organization’s parts database.

Here’s a slightly more detailed version of the pyramid.


What do you think? Can you use the hierarchy of content needs to assess your content requirements? Do you agree with the hierarchy?

UPDATE: Hilary Marsh also built a content needs hierarchy, but hers was published in July 2013 (much earlier than this one). I don’t recall seeing her article, but it’s possible that it’s been kicking around in my subconscious mind for several months. That said, we do have some significant differences, although happily we both have “accurate” as an important base layer in the pyramid.

What I learned at tcworld India

February 25, 2014 by

Some thoughts after a trip to Bangalore for the tcworld India event.

Having experienced roughly eight blocks of downtown Bangalore, I will refrain from making broad generalizations about India.

The TWIN and tcworld teams who were responsible for organizing the event did not disappoint. They had a great roster of speakers and topics. Many thanks to them and to all their volunteers. I know that making an event appear to run smoothly requires extensive (and usually frantic) work behind the scenes.

Indian audiences ask great questions. In the US and Europe, it’s common to get a statement in the form of a question, a question designed to show off the questioner’s own expertise, or an extremely specific question that is only relevant to the person asking. (A graphic that shows the problem.) The questions I had after my presentations in Bangalore were pertinent to the topic and quite insightful. Several people did have detailed, specific questions–but they all approached me later in the event when we had time for a lengthy discussion.

Technical communicators at this event were very enthusiastic. It was a welcome contrast from the often jaded perspective we get in the US. I do hope that India’s tech comm community will not learn from their US counterparts in this area. I ran out of handouts due to an overflow audience in my content strategy workshop…but it wasn’t until I walked around the room and noticed that people were sharing handouts that I realized this. Nobody complained or inquired about the availability of additional handouts. In any other venue, at least one person would have asked, “hey, are there any more handouts??”

Some notes on culture….It appears that the etiquette rules for cell phones are different. There was no request to turn off or silence cell phones, and several people took calls during the sessions (while sitting in the session). That said, the protocol appears to be that you take a call very quietly, hunched over, with your hand covering the phone. It wasn’t noticeable unless you were very close by. Somehow, given the prevalence of Loud Talkers, I don’t really see this approach working in the US.

India is a really long way from the US East Coast. Yes, I know, DUH, but until you actually get on a plane and experience the travel for yourself, it is quite indescribable. 10:30-hour time difference, 20-30 hours of actual travel door to door. It is NOT pleasant. After arrival, I had a couple of days to get adjusted, and spent my time in useful pursuit of scarves and a lovely carpet. I’ll note that India already has chip-and-PIN credit card terminals, unlike the “advanced” United States.

Service standards in India are much higher than elsewhere. I could get used to the tea server, or the guy whose job it is to bring me my omelet from the breakfast station. (And who first told me there would be a slight wait and then apologized profusely when he delivered the food a very few minutes later.) All this wonderful service is made possible by the large number of staff. At breakfast, I eventually counted and realized that we had about 10 servers working. In an equivalent U.S. venue, I would have expected to see 2-3 harried people. In the case of the five-star “Western” hotels in Bangalore, the hotel rates are perhaps half what you would pay for the same hotel in the U.S. or Europe–high by Indian standards, but low by US/European standards.

Speaking of service, I also noticed that I kept seeing the same staffers…the guy who checked me in with a smile at 4 a.m. was still at the front desk at 11 a.m. The hostess at dinner was working again at breakfast. The lunch guy also brought up an evening room service tray. I’m not sure exactly what the shifts are, but they are definitely more than eight hours long.

Back to the conference…we’re always talking about global content, global audiences, and the like. Numerous people approached me during the event to tell me that they follow our web site. Hearing “oh, I love your web site” after a trip halfway around the world? THAT gets my attention. It also made me think…we have web site analytics, so we are aware that India is our fourth-largest readership (after US, Canada, and Germany). But charts are not at all the same thing as real live people telling me that they read our stuff.

This is why we travel for conferences; to meet people and make personal connections; to understand how a particular location shapes people’s thinking; to expand our horizons past our own community and culture.

Many thanks to the all the participants at tcworld India. I enjoyed getting a glimpse of your world, and I hope to return.

Strange bedfellows: InDesign and DITA

January 27, 2014 by

or, What you need to know before you start working on a DITA to InDesign project.

There are a lot of ways to get your DITA content rendered into print/PDF. Most of them are notoriously difficult; DITA to InDesign, though, may have the distinction of the Greatest Level of Suck™.

InDesign XML formats

InDesign provides several XML formats. InDesign Markup Language (IDML) is the most robust. An IDML file is a zip container (similar to an EPUB). If you open up an IDML archive, you’ll find files that define InDesign components, such as pages, spreads, and stories. If you save a regular InDesign file to IDML, you can reopen the IDML file and get back your InDesign file, complete with layouts, graphics, formatting, customizations, and so on.

IDML is both a file format and a markup language. The IDML language is used inside the IDML file. In addition, a subset of IDML markup is used in InCopy files (ICML). Where IDML can specify the entire InDesign file, ICML just describes a single text flow.

(There is also INX, but that format is for older versions of InDesign and has now been deprecated.)

If you are planning to output from DITA to InDesign, you probably want ICML. The IDML language is used in both IDML and ICML files. The IDML specification is available as a very user-friendly PDF on Adobe’s site. I spent many not-glorious hours plowing through that document.

My best tip: If you need to understand how a particular InDesign component is set up in IDML, create a small test file and then save the file out to InCopy (ICML) format. This will give you an almost manageable snippet to review. You’ll find that InDesign includes all possible settings in the exported file. When you create your DITA-to-ICML converter, you can probably create a snippet that is 90 percent smaller (and includes much less stuff). The challenge is figuring out which 10 percent you must keep.

Understanding the role of InDesign templates

Use an InDesign template to specify page masters, paragraph styles, character styles, tables styles, and more. This template becomes your formatting  specification document.

To import XML content, do the following:

  1. Create an ICML/IDML file that contains references to paragraphs and other styles (more on this later).
  2. In InDesign, open a copy of the template file.
  3. Place the ICML file in your template copy. The style specifications in the template are then applied to the content in the ICML and you get a formatted InDesign file.

Of course, this nifty three-step procedure elides many months of heartbreak.

The mapping challenge

A basic paragraph, in DITA, looks like this:

<p>Paragraph text goes here.<p>

The equivalent output in IDML is this:
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/body">
   <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
     <Content>Paragraph text body goes here.</Content>

Some things to notice:

  • The inline formatting (CharacterStyleRange) is specified even when there is no special formatting.
  •  The content is enclosed in a <Content> tag.
  • The <Br/> tag toward the end is required. Without it, the paragraphs are run together. In other words, if you do not specify a line break, InDesign assumes that you do not want line breaks between paragraphs.
  • Extra whitespace inside the <Content> tag (such as tabs or spaces) will show up in your output. You do not want this.
  • Managing space between paragraph and character tags is highly problematic.

Other important information:

  • You must declare the paragraph and character tags you are using at the top of the IDML file in the RootParagraphStyleGroup and RootCharacterStyleGroup elements, respectively.
  • <RootCharacterStyleGroup>
       <CharacterStyle Self="CharacterStyle/$ID/[No character style]" Name="$ID/[No character style]"/>
       <ParagraphStyle Self="ParagraphStyle/body" Name="body"/>

  • You cannot nest character tags in InDesign. Therefore, if you have nested inline elements in DITA, you must figure out how to flatten them:
  • <b><i>This is a problem in InDesign</i></b>

    <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/BoldItalic">
          <Content>You have to create combination styles in InDesign.</Content>

  • Generally, you will have more InDesign paragraph styles than DITA elements because DITA (and XML) have hierarchical structure. For example, a paragraph p tag might be equivalent to a regular body paragraph, an indented paragraph (inside a list), a table body paragraph, and more. You have to use the element’s context as a starting point for mapping it to InDesign equivalents.
  • In addition to using hierarchical tags, if you want to maintain compatibility with specializations, you must use class attributes rather than elements for your matches. That leaves to some highly awkward XSLT templates match statements, such as:
    <xsl:template match="*[contains(@class,' topic/ul ')]/*[contains(@class,' topic/li ')]">

  • In addition to paragraph and character styles, you need to declare graphics, cell styes, table styles, object styles, and colors. (There may be more. That’s what I found.)


Tables are not your friend. InDesign uses a a particularly…unique table structure, in which it first declares the table grid and then just lists off all the cells. The grid coordinates start at 0:0. (Most “normal” table structures group the cells into rows explicitly.)

<Table TableDirection="LeftToRightDirection" Self="aaa" ColumnCount="2"
         <Row Self="bbb" Name="0"/>
         <Row Self="ccc" Name="1"/>
         <Column Self="ddd" Name="0" SingleColumnWidth="100"/>
         <Column Self="eee" Name="1" SingleColumnWidth="100"/>

         <Cell Self="fff" RowSpan="1" ColumnSpan="1"
            <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/cell_center">
               <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/$ID/[No character style]">
                  <Content>first cell content goes here</Content>
          [...much more here...]

As you can see, this gets complicated fast.

There is so much more, but I think you get the idea. It is definitely possible to create a DITA to InDesign pipeline, but it is challenging. If you are looking at a project like this, you will need the following skills:

  • Solid knowledge of InDesign
  • Solid knowledge of DITA tag set
  • Ability to build DITA Open Toolkit plugins, which means knowledge of Ant and XSLT at a minimum

The open source DITA4Publishers project provides a pipeline for output from DITA to InDesign. We looked at using it as a starting point in mid-2013. At the time, we found that it would be too difficult to modify DITA4Publishers to support the extensive customization layers required by our client.

Our DITA to InDesign converter is a DITA Open Toolkit plugin (built on version 1.8). It supports multiple unrelated templates for different outputs and specialized content models. It also includes support for index markers, graphics, and other items not addressed in this document. Scriptorium is available for custom plugin work on InDesign and other output types. For more information, contact us.


Trends in technical communication, 2014 edition

January 21, 2014 by

Our annual prognostication, along with an assessment of our predictions from last year.

2013 in review

Our predictions from 2013:

  • Velocity: the requirement for faster authoring, formatting, publishing, delivery, and updates is forcing tech comm into significant changes. This was a theme in our presentations and consulting projects this year.
  • Mobile requirements change tech comm. This is happening, but more is needed.
  • Rethinking content delivery. Again, I’d like to see more.
  • Bill had PDF continuing to thrive, and continued growth of localization requirements, along with another trend related to mobile.

Looking back, these predictions seem quite cautious, but I think that are largely accurate as trends.

On to 2014…


Trend 1: People like their silos.

four silos in a row

flickr: docsearls

Despite increased talk about breaking down silos, people still like them. Working in silos gives those working within them and managers overseeing them a sense of control over their content, whether real or perceived. Technology investments within individual silos will continue for the foreseeable future. This technology will need to “talk” to the technology used within other silos—using a common format—to efficiently share content. While people may like their silos, executive management is growing less fond of them, regarding them as roadblocks to collaboration and contributing factors to excess overhead costs. Looking forward, we will start to see technology investments across or outside of silos that further centralize content management and ease the burden of reusing content across groups.


Trend 2: Reorienting toward a customer perspective of content

looking down into a silo

flickr: neurollero

Organizations are beginning to look at content from the customer’s point of view. Customers just want information; they don’t care about the internal differences between a knowledge base article and a topic from tech comm. As a result, the pressure to integrate these diverse roles and deliver unified information (usually to the corporate web site) is increasing. Executives are aware of these issues, and they do not want to hear about how the internal structure of an organization makes it problematic to deliver what they want.


Trend 3: Blurring of tech comm, marcom, and content strategy

blurred silos at night

flickr: mahalie

The term “content strategy” has been heavily used by those working in tech comm and marcom over the past few years, though the focus has differed. In the scope of tech comm, content strategy primarily involves the process of creating, managing and producing content. The marcom focus of content strategy has traditionally been on audience engagement. The line between these two camps is now blurring. Tech comm is increasingly interested in and responsible for tracking the effectiveness of content, and marcom has an increased awareness of the content management lifecycle and of localization requirements. We’ll see an increase in collaboration between tech comm and marcom as the lines continue to blur, and perhaps a new discipline will emerge as the trend continues.


Trend 4: Apps are winning over HTML5

brick silo

flickr: tinfoilraccoon

This one requires context. A lot of our customers are building apps for their content. In particular, they are using apps for information that is delivered to staff technical support or field services people—employees who go to customer sites and install or fix things. For that specific use case, most companies are opting for an app because:

  • The company provides the employee with a device (usually a tablet) with the technical content. As a result, the target device is known—the company can create a single app on a single operating system.
  • They want functionality, especially integration with other systems, that is easier to achieve with an app than with HTML5.
  • They like the control provided by the app update process.

Does this fly in the face of the bring your own device (BYOD) trend? Yes, it does.


Trend 5: Lots of creativity in output—not source

Silos with art and a QR code

flickr: cstreetus

Business and consumer demands drive content requirements, and the rate at which these demands change is increasing. New apps, new delivery formats, and new use scenarios are constantly introducing new requirements. To respond to these evolving requirements, content developers will need to get creative on the output side while standardizing on a simpler, cleaner source. This will ensure that content is poised and available for transformation and production to required formats.


Trend 6: Content can be an asset—or a liability

Looking at sky from inside silo

flickr: alisonpostma

Like other content strategists, we have been arguing for some time that content is a corporate asset and should be managed properly. But “content is an asset” doesn’t tell the whole story because the corollary is that:

Bad content is a liability.

Organizations are beginning to recognize that their content can help or hurt them. They ignore content at their peril.

What do you think of our trends? Did we miss one?

Webcast: Trends in technical communication 2014

January 17, 2014 by

In this webcast recording, Sarah O’Keefe and Bill Swallow of Scriptorium Publishing discuss what’s new in technical communication. Alan Pringle moderates.

Trend 1: People like their silos.
Trend 2: Reorienting toward a customer perspective of content
Trend 3: Blurring of tech comm, marcom, and content strategy
Trend 4: Apps are winning over HTML5
Trend 5: Lots of creativity in output—not source
Trend 6: Content can be an asset—or a liability

Will it blend? Content management software and localization companies

January 6, 2014 by

Vasont, TransPerfect, and Astoria. Really??

Disclaimer: This post is complete speculation. I have no useful inside information to work with regarding the merger.

As you may have heard, TransPerfect recently acquired Vasont. (The press release uses words like “merge” and “integrate” and carefully avoids the A-word.) This is the second component CMS that TransPerfect has acq…merged with in the past few years. The first one was Astoria in 2010.

Thus, TransPerfect now has a lineup of localization services, translation management software, and two component CMSes.

Localization service providers (LSPs) are facing a generally difficult market—there’s a ton of demand for localization, but vendors are squeezed because of the following factors:

  • Most customers focus on price (pennies per word) and not quality.
  • Increased use of machine translation (sometimes on the client side, sometimes on the vendor side).
  • Increased use of automated formatting (based on XML), which greatly reduces the revenue stream from desktop publishing.
  • Use of technologies that support incremental translation and better pre-translation matching (thus reducing the total number of words to be translated by the vendor).

Given these challenges, it seems logical to extend the LSP’s revenue stream with any or all of the following:

  • Localization software, such as translation management and terminology management systems
  • Content creation software, such as content management systems
  • Professional services, such as systems integration, content strategy consulting, and so on

Actually making this happen is challenging for the LSP because:

  • Selling enterprise software is different from selling localization services.
  • The LSP must become a trusted partner rather than a commodity supplier.
  • To sell software or services related to content development, the LSP must be involved at the beginning of the content lifecycle. Most localization services are sold at the end of the content lifecycle.
  • Most clients have separate content and localization roles, which makes it difficult for the LSP to cross the gap from the (usually late in the cycle) localization manager to the (usually early in the cycle) tech comm or marcom manager.

If the general strategy is “move upstream in the content lifecycle,” then acquisition of content-development technologies makes a whole lot of sense. What seems weird to me is the acquisition of two component CMS companies. Why would TransPerfect do this?

Disclaimer #2: Transitioning from “pure speculation” to “magical thinking.” Consider yourself warned.

Revenue? I think not.

TransPerfect has been on the Inc. 5000 list as a rapidly growing company for several years running. The company is privately held, but Inc. has their 2012 revenue as $341.3M, up from $220M in 2009. That works out to around 15% annual growth. To keep that growth rate going, TransPerfect would be looking for roughly $50M in new revenue in 2013 and $60M in 2014. It’s extremely difficult to find revenue information about Vasont (and for bonus points, the company has both a sister company and a parent company), but it looks as though revenues are somewhere in the $6M to $7M range based on a couple of moderately sketchy sources. The extreme best case scenario is that Vasont/Progressive/Magnus/Whatever contributes around $10M in new revenue.

Could it be the technology?

After a deeply nonrigorous search, I could not locate any patents for Vasont, Progressive Information Technologies, or Magnus Group. It’s possible that Vasont has developed technology in the CCMS space that is interesting but unpatented.

What about Astoria?

Will TransPerfect maintain two separate CCMSes? This seems thoroughly inefficient, but it’s not clear to me that it’s even possible to combine the two systems into a single one.

TransPerfect could possibly market the two systems differently to appeal to different customers. For example, Astoria might be the SaaS solution and Vasont the on-premises solution. Or maybe one system would be positioned as the “enterprise” system and one as the “small-to-medium business” system.

From a software product management point of view, none of these options makes a whole lot of sense. Even if TransPerfect intends to keep developing both systems separately, they face an uphill battle in convincing potential buyers of their plans. A few years back, SDL acquired two CCMS systems. They repositioned Contenta for “non-DITA” and S-1000D solutions and left Trisoft (LiveContent Architect) in the DITA space, thus separating the two systems by content model and industry vertical. The transition was difficult for some customers.

It must be about localization sales.

I’ve reached the conclusion that this acquisition is about sales. Specifically, it’s about localization sales. If TransPerfect is selling CCMSes to various companies, that provides them with a logical pipeline of prospects for translation management systems and localization services. The $10M or so that each CCMS might produce in annual revenue is simply the entry fee for access to potential new customers for bigger and better things.

In this context, buying up direct competitors and leaving them more or less “as is” makes some amount of sense.

But will it blend??

I’m not sure this is going to work. Even if TransPerfect intends to keep both systems under development, Vasont and Astoria’s competitors will certainly highlight the risk of buying a CCMS that has an in-house competitor.

Combining the two systems and creating something that provides the best of both systems—call it “Vastoria” or “Assont”—would make more sense in the long term. Perhaps they could organize an in-house death match? (“And may the odds be ever in your favor…”)


Minimum viable content

December 2, 2013 by

…in which we explore the idea of minimal viable product as applied to technical content.

You’ve probably heard of minimum viable product, which has “just enough” features. In technical communication, minimum viable content isn’t a new idea—it’s a common survival strategy—although I think the more accurate label would be minimum defensible content.

This Page Intentionally Left Blank But minimum viable content, like its product counterpart, should not be a hastily assembled scrapple (NSFVKHG: not safe for vegetarian, kosher, halal, or gourmet eaters). Instead, minimum viable content should be a strategic decision based on the organization’s overall content strategy and questions such as these:

  1. What are the regulatory requirements for this content?
  2. How does this content help meet the organization’s business goals? What is the purpose of this content?
  3. In what formats is this content needed? In which languages?
  4. Who must create the content?
  5. What is the content velocity? How quickly must it be delivered and how often will it change?

With these and other questions, you can determine your true minimum viable content.

I believe that, for many organizations, delivering minimum viable content would be a long step up from the status quo. I’ll have a lot more on this topic at tcworld India in February.

What do you think? Do you deliver minimum viable content? Or desperately triaged content?


Light-weight authoring tools are taking over

November 14, 2013 by

The basic idea of structured content—separate storage of content and formatting—is changing production workflows and, increasingly, content creation tools. In FrameMaker 12, Adobe joins the party on the tech comm side.

In the upcoming release of FrameMaker 12 (source: Adobe):

  • Adobe is launching a “FrameMaker Lite” edition called FrameMaker XML Author. This product will be a subset of FrameMaker for authoring, collaboration, and review. It will not include “regular” FrameMaker publishing features (such as save as PDF).
  • FrameMaker XML Author will be significantly less expensive than regular FrameMaker.
  • To publish the content created in FM XML Author, the content can be pushed through the DITA Open Toolkit (integrated) or through the full version of FrameMaker. (Or, presumably, any other tool that renders XML content.)

One of the criticisms of FrameMaker an an XML authoring tool has been that it is too big and too expensive. With this new approach, Adobe provides a way to license a less expensive authoring tool for content creators.

This seems like a trend to watch.



Extending our content strategy empire—at least to the Empire State

November 1, 2013 by

Longest. Interview. Ever.

Bill Swallow and I first met in person at the Help ’99 Conference in Dallas, Texas. (1999, not 1899!) Today, we are pleased to announce that Bill is joining Scriptorium as a full-time technical consultant.

As you might expect, Bill has a ton of experience in technical communication. Over the past few years, his work has emphasized content strategy in multilingual environments—how to create localization-friendly content, set up multilingual authoring and publishing workflows, and streamline localization processes.

headshot201310Bill is an experienced consultant and excellent public speaker. At the upcoming Intelligent Content Conference (February 26-28, 2014), he will be discussing globalization and its relationship to intelligent content. From the description:

This presentation takes a look at intelligent content’s role in global markets, and how the entire content cycle directly affects a business’s bottom line (revenue). Though we are often concerned with cost of translation when developing content for global markets, traditional cost reduction practices (translation memory, reduced rates) simply aren’t enough. The number one means of cost control when engaging global markets is being able to establish a profitable revenue stream by delivering quality product in those markets in a manner that is meaningful to them. By employing intelligent content with attention to globalization, we can ensure that the information we produce meets market and delivery demands in a timely manner.

Although we’ve already loaded Bill up with projects through 2014 (that might be a slight exaggeration), he’ll also be doing some blogging here.

You can reach Bill at firstinitiallastname at scriptorium dot com, and he’s quite active on Twitter as billswallow.

Bill will be based near Albany, New York.

Welcome aboard, Bill!

Why publishing architecture matters in localization

October 9, 2013 by

“It’s not about the tools.” Except when it’s totally about the tools.

If your content is going to be translated, you need to understand your publishing tool’s localization support. Here are some issues to consider:

Which languages do you need?

Assuming that your first language is English, you can create some broad categories for language translation:

  • Western (European) languages, which use the Latin alphabet, such as English and FIGS (French, Italian, German, Spanish). Usually the first bloc of languages to be established.
  • CJK (Chinese, Japanese, Korean). Languages that use thousands of characters, which require larger fonts. Often referred to as “double-byte” languages because the fonts require more storage than the Western languages.
  • Eastern European languages, including Russian, Slavic languages, Hungarian, and Turkish. May require a non-Latin character set such as Cyrillic.
  • Other Asian languages, such as Thai and Vietnamese, which use complex scripts. Certain letter combinations change the glyph that is required; similar to the ff or fi ligatures in English, but much more extensive.
  • Right-to-left languages, such as Arabic and Hebrew.

XML and HTML can theoretically handle all of these languages, but some authoring and publishing tools cannot.

Template-based publishing

In a template-driven workflow, you create a formatting template that spells out page size, fonts, paragraph and character styles, tables styles, and so on. Templates are fantastic in a single language workflow, but as you add languages, you need a copy of the template for each language, and this quickly becomes an overwhelming maintenance problem.

Take a simple example: a note paragraph.

NOTE:    This is a note.

For this blog post in WordPress, I have hard-coded “NOTE:” by typing it in and applying bold. But in a template-driven tool, I would create a style called note and specify that the note paragraph should always begin with the word “NOTE:”.

And now the fun begins. In a German template, I need to replace note with “HINWEIS”. The rest of the template is largely identical; I’m using the same fonts and most paragraphs have the same definitions in English and German. But because I need to adjust the note, caution, and warning paragraphs—and change the word “Chapter” to “Kapitel”—I have to make a copy of the entire template document.

Basic changes can become unmanageable very quickly.

Localization string files

The current best practices for multiple language outputs is to use string files. These are text files, usually XML, which separate out the language-specific items from the common formatting. This allows you to create a single formatting specification that references language-dependent information as appropriate.

With string files, localization costs escalate much less than with individual template files for each language. There are still additional complications, such as configuring for right-to-left output or “unusual” requirements. (For example, many language use “Figure 2″ or similar, but Hungarian uses “2. Figure.” Just swapping out the word “Figure” doesn’t work for Hungarian.)

Gruesome technical details, available in two formats to accommodate different learning styles: