Scriptorium Publishing

content strategy consulting

Own your translation memory

November 28, 2016 by

This post is part of a series on the value proposition of localization strategies.

The source content you develop is your intellectual property. The translation of that source content is also your intellectual property, regardless of who performs the translation.

But that translation doesn’t only exist in whatever document or webpage you translated. It also lives in translation memory, and many companies fail to own critical resource.

Failing to maintain a copy of your translation memory can be a costly mistake.

What is translation memory?

Professional translators use computer-assisted translation (CAT) tools to make their work easier. When you send a file for translation, the translator imports your file into their CAT tool. The text is extracted and broken up into short segments (usually by sentences or phrases) for translation.

computer memory board symbolizing translation memory

Your translation memory is a vital piece of infrastructure. Maintain it well.

The translated segments are then stored in a database for later use. This database of translated segments is the translation memory.

When a new document needs to be translated, the translator uses their CAT tool to consume the file and then reflect it (compare it against past translations) against the translation memory. This reflection identifies segments that have already been translated, segments that loosely match what is in translation memory, and segments that are new.

How is owning translation memory useful?

Owning your translation memory isn’t just about owning your intellectual property (although that’s a key reason to own it). Ownership offers you some key advantages:

  • Lock-in prevention: Owning your translation memory means you are not “locked in” with any one translation provider. You can change providers or hire additional ones at any time and still benefit from your previous translation work.
  • Source content consistency: You can use your memory to make your source content more consistent. Rather than pay translators to translate content that slightly differs in wording from other content, you can catch these differences and correct them in the source content before sending it to translators.
  • Translation consistency: Building on the previous item, variations in source content create variations in translation. If you own your translation memory, you can periodically audit and correct these variations in translation. This ensures one correct (preferred) translation for each source segment, which prevents translators from guessing which match in translation memory to use.
  • Quality assurance: Whenever you receive translations back from your providers, you can compare it against what you have in memory to ensure that the new translations are consistent with your approved and stored translations. Doing so can significantly reduce the amount of time spent on review.
  • Cost estimation: Before sending content out for translation, reflect it against your translation memory to see how many new and changed segments require translation. You can then use a word count of those segments to estimate the cost of translation.
  • Pretranslation: Should you need to use a translator who does not have a CAT tool available (not advisable, but allowable for very special cases), you can use what’s in memory to create a partial translation for the translator to work from.

To leverage your translation memory in this manner, you must purchase a CAT tool of your own. There are many tools available at varying prices, but the cost of purchasing one is miniscule compared to the savings from owning your translation memory.

Do you own your translation memory already? If so, have you found other uses for it? Please share in the comments!

tekom 2016—Götterdämmerung?

November 21, 2016 by

After the anti-DITA insurrection at tekom 2015, the 2016 conference took a turn in a different direction.

Here are a few highlights. Keep in mind that the conference is huge; it would take a platoon of people to cover the 250 technical sessions.

The overarching theme of tekom 2016 was intelligent content, which was covered in several complementary tracks:

Façade of the Stuttgart convention center with teton and intelligent information decals

tekom at the Stuttgart convention center // Photo: Alan Pringle

  • DITA Forum (English), organized by Kris Eberlein and me, focused on case studies and concrete implementation experiences in DITA. Alan Pringle of Scriptorium and Tina Meißner of parson AG discussed the development of the English and German versions of LearningDITA.
  • Intelligent Information (German) asserted that “dynamic delivery of user information is the future of technical communication: personalized information at the right time in the right location in the right format. Requirements to create intelligent information include structured authoring, component content management, metadata, intelligent delivery, use cases, and user experience.” (source)
  • Information Energy (English) largely focused on the need for Information 4.0 in response to Industry 4.0.

iiRDS standard

The tekom organization is working on a new standard, iiRDS, the Intelligent Information Request and Delivery Standard. The standard was introduced by Michael Fritz and a team of industry experts during the conference.

Here is the description from tekom, along with my translation:

Die Bereitstellung von Nutzungsinformation muss automatisiert werden, damit diese kontextabhängig und individualisiert geschehen kann und sich in Konzepte wie Industrie 4.0 oder Internet of Things integriert.

Um dieses Ziel zu erreichen fehlte es bislang an einem branchenübergreifend akzeptierten Standard. Diese Lücke will die tekom-Arbeitsgruppe “Information 4.0” aus namhaften Vertretern von CMS, Industrieanwendern, Beratern und Wissenschaftlern mit dem tekom-iiRDS schließen.


The delivery of user information must be automated to enable context-independent and personalized publishing and to enable integration with Industry 4.0 and Internet of Things applications.

Until now, we cannot achieve this goal because we do not have an accepted cross-industry standard. The tekom working group Information 4.0 intends to close this gap with the tekom-iiRDS standard, which is being developed by leading representatives of CMSs, industry, consultants, and researchers.

Ulrike Parson of Parson AG provided a detailed overview of iiRDS. She writes this:

We need to standardize the metadata that we deliver together with our documentation and which makes our content semantically accessible. Only this way can documentation content become exchangeable and usable for multiple producers. That’s the fundamental concern of iiRDS.

Tekom plans to launch the standard with a Request for Comments phase on March 31, 2017. The standard will be released under Creative Commons license. Currently, there is minimal information on the tekom site, but you can sign up for a newsletter.

It’s too early to provide any assessment of the standard still under development, but I have a few comments and questions:

  • The working group is a who’s who of German tech comm experts.
  • It’s unclear whether iiRDS will be a competitor to other modular standards, like DITA and S1000D, or whether those standards could be integrated with iiRDS.
  • There are a lot of flavors of Creative Commons licenses, and I’d like to know exactly what the license will be.
  • I’d like to know more about governance of the standard.
  • It’s fascinating to see the German CMS vendors support a standard after arguing vehemently at tekom 2015 that their various flavors of XML, bound to their individual systems, were Just Fine Thank You.
  • What differentiates iiRDS from DITA? (I think the answer is a metadata classification scheme based on PI-Classification.) Ms. Parson also says in her article that iiRDS will be a standard for interchange and not authoring.
  • Could that metadata be implemented in DITA? (Yes, via a metadata specialization and/or subjectScheme?)
  • Why choose iiRDS? Why not?
  • It is really open? Open source? Usable? Configurable?
  • Will the market support this new standard?

Information Energy

The Information Energy track focused on how information must evolve to meet the requirements of Industry 4.0. Briefly, the Industry 4.0 is the concept of the intelligent machine—a factory where machines are interconnected. The concept is related to the Internet of Things, but where IoT usually focuses on consumer use cases (the refrigerator that automatically reorders food), Industry 4.0 focuses on the business applications of connected machines.

DITA Forum

Interest in DITA was strong at tekom 2016. The DITA Forum sessions were well-attended. The DITA Forum offered several case studies (SAP, rail industry, e-learning), an overview of specialization concepts, and a panel discussion on DITA versus custom XML versus no XML.

Other DITA content

Confusingly, there were other DITA presentations in addition to those in the DITA Forum. Dr. Martin Kreutzer of Empolis provided an excellent overview of different ways to manage variant content in DITA. (Slides in German)

Meanwhile, Karsten Schrempp of PANTOPIX delivered a presentation entitled, Was wir von DITA lernen könnten – wenn wir denn wollten! …What we could learn from DITA—if we wanted to! (Slides in German) Please note the use of subjunctive mood (in both his original German and my English translation).

This was an interesting presentation. Mr. Schrempp outlined various DITA features and described how these features, if not the standard itself, are potentially useful even in Germany (where DITA is notoriously unpopular). There were a few assertions that stood out to me:

  • Several times during the presentation, he reminded attendees that referring to DITA advocates as the “DITA Taliban” was not very helpful or productive. It was quite amusing, even as the repeated reminders took on a tinge of “But Brutus is an honorable man…”
  • DITA versus CMS. Mr. Schrempp tried to close the gap. In Germany, there has been the argument that DITA is “merely” a standard for content development on the file system. He pointed out that DITA used inside a CMS is still a DITA system. In German tech comm circles, this is a controversial assertion.
  • Toward the end of the presentation, in almost a throwaway comment, Mr. Schrempp mentioned a key difference between DITA CMS systems and the proprietary XML CMS systems more popular in Germany: Purchasing a DITA CMS does not lock a customer into a specific content delivery portal. Some of the DITA CMS vendors do provide content delivery portals, but DITA content can be delivered in any DITA-compatible portal. By contrast, most German CMS vendors create both authoring systems (CMS) and content delivery systems. Because each CMS uses its own flavor of XML, choosing a CMS effectively means choosing the content delivery system at the same time. This selection is decoupled in the DITA market.

In a later discussion, I spoke with Mr. Schrempp in more detail about this issue. He pointed out that the new iiRDS standard could enable a customer to buy a CMS from one vendor and a content delivery portal from another vendor. iiRDS could provide the middleware layer to cross-connect the otherwise incompatible content models.

Politics at tekom

The US election, which occurred in the middle of the conference, was a topic of discussion throughout the event. A few serious conversations drove home the worldwide impact of developments in the United States. From an Indian participant, I heard concerns about possible changes to the H1-B visa program. From an Eastern European participant, there was grave concern about the US’s continued commitment to NATO and to former Eastern Bloc countries that are now NATO members.


The theme that emerged from tekom was the need for integration of information from multiple sources. This integration requirement is driving interest in standards. The iiRDS standard is clearly aimed at the huge German machinery documentation market.

Claire Parry of CMP Services has a tekom takeaways article.

What did you think of tekom 2016?

Developing training websites in multiple languages

November 15, 2016 by

Tina Meißner of parson AG cowrote this case study.

This case study shows how Scriptorium Publishing created the free DITA learning website by combining the DITA learning and training specialization, GitHub, XSLT, video, and WordPress—and how parson AG adapted those technologies to develop the German site,

DITA, which stands for Darwin Information Typing Architecture, is an open XML-based standard for creating, organizing, and managing content. The LearningDITA websites use multiple approaches to educate students about DITA. Lessons include step-by-step instructions, guided and independent exercises, and assessment questions. Courses also provide resources, such as links to instructional videos.

Managing source content and video

The LearningDITA project is created as DITA XML files. The DITA learning and training specialization provides specific structures for training content (for example, lesson objectives and test questions). The LearningDITA source content includes conceptual information, step-by-step instructions, exercises, and quiz questions.

For file storage, Scriptorium chose GitHub, a web-based repository based on the open-source Git version control system. With a free GitHub account, authors from all over the world can contribute and revise content in the LearningDITA repository.

Anyone can download the open-source files from GitHub and adapt the content for their own purposes.

In addition to source control, GitHub provides a project wiki. In the LearningDITA project, the wiki provides editorial and content development guidelines. The guidelines promote consistency in the source files.

The video content in courses is not part of the DITA source. Instead, we record and edit video clips with Adobe Captivate. Although we could have used an open-source tool to create the video content, we already had a license for Captivate. We post the videos to YouTube, the free video hosting service. YouTube provides a no-cost alternative to the challenges of hosting and maintaining video content on your own web server.

Creating the site

Scriptorium already had experience with WordPress, an open-source system for managing and publishing web content, so we decided to publish as a WordPress-based site. We identified a learning management system (LMS) that integrates into WordPress. The LMS is a commercial system with a small licensing fee, but it supported our major requirements, including interactive quizzes and management of student accounts. There was no business justification to spend time and money creating our own LMS in WordPress when we could buy an inexpensive tool.

Scriptorium developed a process that transforms the DITA XML files into WordPress-compatible XML. This solution was based on the DITA Open Toolkit, a collection of open-source technologies that convert DITA XML into HTML, PDF, and other formats. We built an Extensible Stylesheet Language Transformations (XSLT) stylesheet in the toolkit to convert DITA XML into WordPress-compatible XML. We imported the transformed XML content into WordPress, made some minor manual adjustments in the LMS to ensure that test questions are associated with the right lessons, and published the course.

Localizing the DITA content

To localize the course files created in DITA, parson downloaded the English files from the GitHub project and copied them. We kept the structure of DITA elements and replaced the textual content inside the elements with the German translation.

The LearningDITA courses provide a large number of code snippets that are embedded in step-by-step instructions. Users can follow the instructions and compare their result with a sample solution presented in an associated DITA sample file. parson localized the sample files to match the code snippets in the instructions. We also localized the id attributes and the file names of the sample files. This does not cause any problems because the sample files are not referenced by DITA maps or topics. In addition, we replaced screenshots of the sample solution that were taken in the visual mode of an XML editor.

parson also needed to decide whether to localize the file names of the course files. As they form part of the resulting URL, a German name is easier for German users. But that also implied adapting all topic references.

Finding the right DITA terms in German

Finding established equivalents for DITA terms was another difficult task. parson mostly used direct translations to describe the function of DITA elements in the text, such as paragraph = Absatz, note = Hinweis, and substeps = Teilschritte. If there was no established German translation, parson needed to create one; for example, Auswahltabelle for a choice table.

Another challenge was to find a compromise between comprehensibility (that is, using German translations) and recognition by users (that is, using Anglicisms). parson tried to avoid Anglicisms as much as possible. Nevertheless, we decided to keep frequently used, DITA-specific terms to make them easily recognizable. Examples are: (task) topic = (Task-)Topic, map = Map, and body = Body.

We also used the Anglicism if a term did not have a German equivalent and paraphrasing would have been too complicated or too long in the context. These were case-by-case decisions. For example, we translated “inline element” with Inline-Element to align it with the translation for “block element” (= Block-Element) because both terms are often mentioned in one breath.

Setting up the website

Scriptorium provided parson with a list of required WordPress plugins and settings for setting up, so a hosting agency could reproduce the structure and layout of the English website. parson changed colors and fonts according to the corporate design. We also adapted the logo by changing the colors and by localizing the text “learningDITA” to “DITA lernen.”

parson localized the HTML contents on the website (that is, all but the DITA course contents) and the explanatory texts that come with the plugins. The explanatory texts confirm input by users or indicate errors during registration, login, and so on.

We customized the transformation from DITA to WordPress-compatible XML by localizing auto-texts that are used by the transformation. We also implemented the transformation in our XML editor oXygen to enhance usability. We can now start the transformation directly from oXygen.

Legal requirements for websites differ from one country to another. In Germany, Austria, and Switzerland, websites must state the publishing organisation or person, including name, contact information, trade registry number, etc. parson added this information, commonly known as Impressum. We also localized the privacy policy (Datenschutz) according to German legislation.


  • Open-source solutions are free but often expensive in terms of effort. If a licensed add-on to an open-source solution does what you need, does it make more financial sense to use it instead of creating your own solution? Is it worth the development effort to completely automate a process if that process is used infrequently?
  • Do not make assumptions about the cloud-based services people use. For example, not everyone wants a Google account because of privacy concerns, and Google services are blocked in some locations.
  • When content is translated, be prepared for that process to uncover errors. Need a workflow to address them.
  • Try to balance between translating DITA terms and adopting the original terms. These decisions can depend on the amount and the kind of Anglicisms used in your language, and on your target group.
  • When thinking about what to localize (id attributes, file names), weigh the work you would have to put in against the benefit of the consistency.
  • Consider legal requirements for different countries.
  • Establish a change process for subsequent changes after publication. The content always lives longer than you think.

After graduating in physics, Tina Meißner decided on vocational training in technical writing. She is doing a tekom traineeship at parson AG. She works in documentation projects and is responsible for the localization of into German.

Slides from the tcworld presentation on the Learning DITA project:

Viva Las Vegas: LavaCon 2016

November 7, 2016 by

Las Vegas has something for everyone. Whether you enjoy seeing shows, playing slot machines, eating delicious meals, or just exploring the many hotels on the strip, there’s no shortage of exciting things to do. The same can be said of LavaCon Las Vegas, which offered all kinds of sessions and activities.

The lights on Fremont Street: daytime anytime!

The lights on Fremont Street: daytime anytime!

Before I attended LavaCon, I had never been to Las Vegas. My first impression? Total sensory overload.

LavaCon was held at the Golden Nugget hotel on Fremont Street. With its lighted ceiling, zip line, and variety of costumed street performers, Fremont Street was an overwhelming but fun experience.

I went to LavaCon as an exhibitor for Scriptorium, along with Bill Swallow, who gave a presentation on The Value Proposition of Translation Strategies. Although I was busy at Scriptorium’s booth for most of the conference, I did attend a couple of sessions:

  • Captain Content: Civil War?: In this presentation, Alan J. Porter broke down content conflicts into two major viewpoints: those who see the benefit of change and those who don’t want it, illustrated with cartoon drawings of original heroes and villains. I found this approach appealing as both a content strategist and a comic book geek.
  • TechComm Throwdown: In keeping with election season, this session pitted industry professionals against each other in short, hilarious debates moderated by Bernard Aschwanden. The speakers defended their positions on whether the customer is always right, the benefits of working remotely, and which is better: big or small business.

The sessions I attended were both entertaining and informative, and so were the conversations I had with attendees and other exhibitors. Here are a few patterns I observed throughout the conference…

Spirit of collaboration

Everything about LavaCon Las Vegas was designed to foster teamwork. On the opening day, attendees tossed colored ping pong balls into the air and picked them up at random. The color they chose represented their “tribe” for the duration of the conference. Each tribe worked together to win prizes for most tweets, most steps taken, and most water consumed. As an exhibitor, I wasn’t part of a tribe, but I still felt the positive impact that these tribes had on LavaCon’s collaborative atmosphere.

Other activities also carried the spirit of collaboration throughout the conference. The group activities in the evenings gave exhibitors and attendees an environment where we could relax and talk to each other more informally. I had a blast singing on karaoke night and watching speakers give “surprise” presentations at the networking dinner sponsored by Adobe.

Attendees could enter a drawing for a cash prize if they talked to all of the exhibitors and got our signatures. Instead of simply stopping by for a signature (and some chocolate!), most attendees who visited Scriptorium’s booth spent time talking with us and asking in-depth questions. This resulted in plenty of interesting discussions about content strategy, including…

Planning for the future

Most companies who contact us have problems with their content development processes. While we had our fair share of content problem discussions at the booth — for example, what to do about content silos, or how to deal with change management — we also saw an uptick in conversations about long-term content strategies.

Several people who visited the booth discussed their future content plans with us. Whether they were researching structured content, anticipating new localization requirements, or looking into scalable solutions, these people were clearly invested in long-term business goals. They didn’t have specific content problems to solve, but were instead looking for ways to prevent those problems from happening.

At some of the conferences and webcasts I’ve attended in the past year, I’ve seen an increased interest in content strategies that encompass more than just tech comm. This focus on company-wide content strategies, plus the new interest I saw in long-term planning, indicates that more people are thinking critically about their content and how it can serve their business goals.

Interest in learning DITA

Around this time last year, Scriptorium had just started introducing, our free resource for learning DITA, at conferences. People were interested in LearningDITA even when it was brand-new, and this year, we saw that interest continue.

Some people who stopped by the booth told us that they were using LearningDITA as a starting point for training their tech comm teams. Others were evaluating whether DITA was a good fit for them, and told us that LearningDITA had been a valuable resource during that process. Overall, people expressed a desire to learn more about DITA, whether or not they had immediate plans to start using it.

It's not LavaCon without lava!

It’s not LavaCon without lava!

I had a great time at LavaCon, not only exploring Las Vegas for the first time, but also having excellent discussions about content strategy with many people. In this case, the “what happens in Vegas stays in Vegas” rule doesn’t apply—the things we learned and the connections we made will last long after this conference.

The horror! More content strategy monsters!

October 31, 2016 by

The ghoulish nasties I depicted two years ago in Content Strategy vs. The Undead continue to haunt content strategy implementations and information development projects.

They just… won’t… DIE!

However, they are not the only monsters that can terrorize your content strategy implementation.

If you are still plagued by zombies, vampires, and mummies, read our previous undead post to arm yourself appropriately. The horrors that follow take other approaches to defeat.

horror written on wall

Don’t let your content strategy implementation become a horror show!

The Blob is an amorphous extra-terrestrial creature that consumes everything in its path. In content strategy, the blob is an unexpected critical requirement that lands mid-implementation, consuming your project’s resources faster than you can allocate them. As in the classic movie, the only way to defeat this creature is to freeze it in its tracks until you have the means to properly wrangle it.

The Fly was created through the careless actions of an overzealous scientist. Despite having a brilliant plan and amazing technology, one small oversight was enough to turn his masterpiece into a horror show. When executing your content strategy, pay attention to all details to avoid any unwanted surprises. You may have the best of intentions with your implementation, but one careless mistake could derail your entire project.

The killer great white shark from Jaws is truly a thing of nightmares. Perhaps the most horrific element to this creature’s story is how realistic the situation seemed. Our takeaway from Jaws is to never downplay a hazardous situation while conducting business as usual. You may learn that your choice technology vendors have suddenly been purchased by a competitor, or perhaps your own company has been acquired in the middle of your implementation. When this level of uncertainty arises, pause all implementation activity and assess the situation (clear the beach and survey the waters). You must adjust your strategy and prepare yourself for new risks. At minimum you may need a bigger boat.

Have you encountered any hideous creatures in your work? Please share in the comments!

And, as a general rule, always avoid anything “abby-normal.”

Easy ways to undermine marketing with content strategy

October 24, 2016 by

Does your content deliver on your marketing promises?

“Our products lead the industry…”

but we can’t create a decent mobile experience.

Karen McGrane writes in the Harvard Business Review:

You don’t get to decide which device your customer uses to access the internet. They get to choose. It’s your responsibility to deliver essentially the same experience to them — deliver a good experience to them — whatever device they choose to use.

Any claim of cutting-edge industry leadership must be supported by a good web site experience, and that includes the mobile experience.

“We serve clients in 47 countries…”

provided that they speak English, because we do not offer localized products or content. Also, we use a lot of jargon in English, so good luck to customers with limited English proficiency.

“We care deeply about our customers…”

but only those users with perfect color vision, excellent fine-motor control, and pristine hearing. We use tiny, trendy low-contrast type. We do not provide alternatives to mouse-based navigation. We make heavy use of video, and we do not provide captions as an alternative to listening to the video.

“Our product offering is flexible and configurable…”

but our web site doesn’t work on Safari.

“We offer a luxury experience…”

as long as you don’t need help charging the batteries because that document is incomprehensible in any language.


I’m not sure why, but it makes me think of this:



Localization strategy: improving time to market

October 17, 2016 by

This post is part of a series on the value proposition of localization strategies.

You can make localization “better” by taking a look at localization value. Quality and cost are important value factors, but improved time to market returns the greatest value.

Improving time to market for localized products and content is no easy task. It’s not as simple as adding more translators to the effort; that may cause more problems (and more delays). Improving time to market involves moving localization up the project chain, and to do so effectively requires a localization strategy.

An effective localization strategy begins with the same foundation as other effective communication strategies: an audience-first approach. Who are you targeting? For what purpose? What do they need? What do they value?

cat closely inspecting a butterfly

Inspect every detail!

At the very beginning of a project, the entire audience needs to be considered for every aspect of the project.

  • Marketing campaigns must be culturally and legally appropriate
  • Sales materials and contracts must be reviewed
  • Pricing must be adjusted in some cases
  • Product design must consider all local requirements

The list goes on and on. Every aspect of the project must be evaluated against every aspect of every target market. Doing so will identify variations in requirements before they become problems, and will identify opportunities before they are lost.

What does all of this have to do with time to market? It all starts with setting realistic expectations. The more you know about your target audiences, the earlier you can begin to define requirements, avoid unexpected issues, and plan your release strategy. You are also able to take an iterative approach to translation that runs parallel to development, and build localization testing into your core product testing.

In short, implementing a localization strategy helps you remove many unknowns from the tail end of a project and allows you to optimally target all audiences from the very beginning.

Have you experienced an unpleasant localization surprise at the tail end of a project? Have you improved your time to market by making changes to how you handle localization? Please share your stories in the comments.

DITA to InDesign: the gruesome details

October 10, 2016 by

We’ve written before on what lurks beneath the surface of an InDesign file, and how drastically it differs from the DITA standard. When you’re looking at going from DITA to InDesign, though, there’s a lot that you need to take into consideration before you jump in.

An apt visualization of today's theme.

An apt visualization of today’s theme. Lisa Risager, Flickr

DITA separates formatting from content, but formatting content is one of the most powerful features that InDesign offers. You need to prepare your DITA content for the transtition from a no- or low-design environment to a high-design platform. You also need to ensure that your InDesign environment is ready, or you’ll wind up with inconsistently-formatted content, or worse, output that will crash the program when you try to import it.

The DITA side

Taxonomy: You need to make sure that you know your content. InDesign offers a wide range of ways to format your content, but there’s not always a direct mapping from DITA. For example, a paragraph element could have a basic body style applied, or perhaps it needs a style with a different margin. How do you determine this?

  • Robust metadata will allow you to identify elements that need to be treated differently. The quickest way is to use the outputclass attribute, but for subtle variations on a style, you may need to consider…
  • Specialization allows you to define custom structures. If you have a type of admonition that lacks a label and adds detail to text nearby, you might create a callout element.
  • Don’t forget the stock offerings of the DITA specification. Images in particular can already specify things like alignment, which may fulfill your needs.

Information overload: Since it’s a desktop publishing (DTP) platform, InDesign takes some shortcuts when it comes to some things. Images, in particular, are a challenge. When you add images to your DITA content, you need to be sure to include both height and width information. This is due to the way that InDesign stores image display information. Rather than saying that an image appears at a point in the document and is X pixels wide and Y pixels high, InDesign identifies an anchor point, then a series of four points that describes a frame, and then places the image within it. Without both height and width, or an image format that you can draw those dimensions from, you’ll have trouble defining how the image displays. The moral of the story: if you have the information available, you should include it.

Just plain weird stuff: While Adobe has made the IDML standard public, InDesign itself isn’t anticipating someone coming along and placing raw code into a template. This results in some very strange behavior.

  • If you have a paragraph element that ends with bolded text, when you place your output into a template, all of the text following that paragraph element will be bolded until InDesign finds another character style element.
  • If something goes wrong with your output and InDesign doesn’t like it, one of two things will happen: the offending content will be dropped, or InDesign will crash without any error. Debugging this can be an exercise in patience.

The InDesign side

The most important part of preparing the InDesign portion of your workflow is getting your templates in order. They should either be prepared before you begin working on your DITA taxonomy requirements, or developed alongside them.

  • Do you need more than one template, or can you use a master template? If you need specific layouts or master pages, you’ll need multiple templates. If the paragraph, character, or object styles between those templates differ, you’ll need to communicate that to whoever is working on your plugin.
  • How well-defined are your object styles? You need to take into account not only things like margins, but also word wrap.
  • Do any of your style names have special characters in them? You need to avoid that. The style declarations on the DITA side need to be escaped if so, and if they’re not escaped properly, InDesign will crash when you try to place your content into the template.
  • Do your paragraph styles have their hyphenation properties set up correctly? If you know you have tables that will be narrow, you need to be careful about this. If the text in a table cell is too short to become overset, but long enough to hyphenate and then become overset, InDesign will crash when you try to place your content into the template.


While transforming DITA into an ICML file will allow you to quickly get your content into InDesign, it isn’t a smart process.

  • Since an ICML file lacks any kind of page information, the only page breaks that will appear are those that are dictated by your paragraph styles.
  • An image only knows where its anchor point is relative to the body content it appears near. This means that if you have multiple images in close proximity, there’s no way to prevent them from overlapping.
  • When you auto-flow content into a page in a template, it uses the same master page throughout. If you have sections of your content that require a different master page, you’ll have to apply it by hand.

Despite these limitations, being able to leverage DITA’s portability and reusability with InDesign’s high-design environment remains a tantalizing prospect. PDF allows for quick, consistent publishing of your content, but any edits require new output, and any formatting updates require changes to the plugin. If you have a production pipeline that utilizes InDesign and you value the fine control that it grants you, a DITA to InDesign workflow may be worth it.

Using XML to integrate database content and desktop publishing

October 3, 2016 by

This article shows how Scriptorium helped one company use XML to integrate information in a database with desktop publishing content.

In most enterprises, useful content exists in a number of different tools or databases. To include that content in your publications, you might use traditional ways of moving the information, such as copy and paste. However, it can be far more reliable, repeatable, and efficient to automate conversion from those tools and integrate the result directly into your publishing solutions.

A large manufacturer of integrated circuits used Adobe FrameMaker (unstructured) to produce reference and conceptual documentation for their very complex processors. Each processor had thousands of registers. Most registers contained multiple bit fields, each containing specific pieces of information, all of which needed to be documented.

The information necessary for documenting the processors and their registers was maintained in the manufacturer’s chip-design database. To produce reference documentation, the writers copied the information from the chip design database and pasted it into FrameMaker. The writers could—and did—edit the information, but usually they were constrained to copy content and apply formatting.

The descriptions for each register consisted of two main parts:

  • A table, which described the bit fields and enumerated values (if any).
  • A diagram, which provided a quick reference to the name, location, and size of each bit field in the register, in addition to other information about the bit fields.

Depending on the size of the register, and the number of fields, these diagrams could be quite complicated. The diagrams were constructed using FrameMaker tables (tables were easier to manipulate than using FrameMaker drawing tools or a separate drawing program).

There were several problems inherent in the documentation process:

  • When a chip was in development, the specifications for the registers and their contents could change rapidly. The copy-and-paste methodology made it hard to keep up with the changing and evolving design.
  • If the writers modified text when copying and pasting, those changes weren’t preserved in the chip design database.
  • Creating and maintaining the illustrations required a large amount of time and thought. Manipulating FrameMaker tables and table cell borders was inefficient and the whole process was potentially prone to errors.

Additionally, the manufacturer did not want to transition to a new tool set, so remaining in FrameMaker was a requirement. Unstructured FrameMaker was perfectly good for documenting the conceptual information; it was only the reference information that was difficult to maintain.

The manufacturer was aware that the reference information could be exported from the database in IP-XACT, an XML-based, vendor-neutral schema for describing chips and their components. However, they needed some help converting the IP-XACT into something that could integrate with FrameMaker, which is when they reached out to Scriptorium.

Scriptorium suggested that an XSL transform could convert the IP-XACT sources into files that could be imported into structured FrameMaker. FrameMaker allows mixing structured and unstructured files in FrameMaker books, so all of the conceptual information could still be maintained as unstructured FrameMaker files. The reference sections could be replaced in the book files whenever they were updated.

Writers were granted access access to the chip design database, so that text corrections to the register and field descriptions could be made in one place.

In addition to solving the basic problem (extracting the descriptions from the database and converting them to FrameMaker), the transform organized the registers in a coherent series of chapters or sections, including a linked summary table for each set of registers, and built the register diagrams. The transform also created persistent cross-reference targets, so that writers could easily create cross-references to register descriptions from conceptual content.

Once the structured files were imported to structured FrameMaker, a Scriptorium-created custom ExtendScript performed final cleanup.

The resulting documentation and diagrams were clear, consistent, and could be re-created from updated sources in a matter of minutes.

The manufacturer (and the success of the project) benefited from these factors:

  • The chip-design database contained almost all the information needed to document the registers.
  • The chip-design database could export XML (IP-XACT).
  • The content in the chip-design database was consistent and of sufficient quality to expose to customers.
  • The writing team could access the chip-design database to enhance and correct the information, where necessary.

Automatically converting content from reliable and consistent sources produced reliable and consistent documentation, which then freed the team to focus their energies on conceptual content.

Consulting lite: life at Scriptorium

September 26, 2016 by

Scriptorium is hiring. Our consulting jobs are a unique blend that you don’t see in many other places. Over and over, I’ve found myself struggling to explain this blend to candidates. So here is an attempt to describe life at Scriptorium.

Job structure

Our technical consultants are full-time, permanent employees with benefits. Our interns are full-time temporary employees with benefits. After 6-12 months, interns are eligible to become permanent employees.

Client load

Employees typically work on multiple client projects in a single week. You might deliver a draft document to one client, then turn your attention to updates on another project, receive a few review comments from a third client, and clear a support ticket from another client.

Each project has a project lead. For small projects, the lead might also do the work; for larger projects, the lead coordinates the project team.

One of the biggest challenges is remembering different communication requirements. For example, we use Basecamp for project collaboration on some projects. For others, we use client infrastructure (usually SharePoint).

Client mix

Our clients come from a cross-section of industries: finance, life sciences, education, high-tech, heavy machinery, telecommunications, state and federal government, non-profit associations, semiconductors, and others.

We specialize in working with organizations that have content problems, and they are found everywhere!

Our consultants are exposed to content workflows across many industries.

Sales and marketing responsibilities

Unlike freelancers, our employees are not responsible for hunting down their own new projects. But our employees do have some sales and marketing responsibilities. These include:

  • Participating in social networking
  • Writing blog posts or other articles
  • Presenting at industry conferences
  • Ensuring that clients are happy
  • Noticing when a client asks for additional work and making sure their issue is addressed promptly
  • Contributing to proposals


All of our consultants travel. Some of that travel is for conferences and industry events, and some is for client visits. No consultant is expected to travel more than 25% of the time.

Cloud systems

Our office space is in Research Triangle Park (Durham), North Carolina. Most of our employees are based there, but all of our systems are cloud-based. Employees can access the information they need at home, or remotely, or while traveling.

Scriptorium versus corporate life

It’s hard to generalize about Scriptorium versus All Possible Full-Time Jobs. But here are some things to consider:

  • Domain knowledge (expertise about the company’s products and industry) is more valuable in a corporate job. Scriptorium employees move from project to project, so the domain changes constantly.
  • If you like change, Scriptorium may be the place for you. We learn new things in every project, and we are always looking for better ways to do things. If you prefer to develop deep expertise in a single set of tools or a specific approach, this is not the right place for you.
  • As a change of pace from client work, you might find yourself working on, writing blog posts, or working on internal processes.

Scriptorium versus freelance life

Bring on the additional generalizations! Working as a consultant at Scriptorium is basically Consulting Lite:

  • Like a freelancer, you have project variety and an ever-changing list of assignments.
  • You do not have to do all your own sales and marketing work.
  • Scriptorium handles administrative support (payroll, taxes, invoicing, and office management tasks).
  • You are paid a salary and not an hourly rate.
  • You have coworkers who can QA your work before it’s sent to the client.
  • You have an office location (if based in RTP), and an internal communication network to discuss projects, share juicy gossip, and abuse the emoji capabilities.


Does consulting lite sound appealing? We’re hiring.