Skip to main content

Author: ScriptoriumTech

XML

Managing implementation of structured authoring


An updated version of this white paper is in Content Strategy 101. Read the entire book free online, or download the free EPUB edition.

Moving a desktop publishing–based workgroup into structured authoring requires authors to master new concepts, such as hierarchical content organization, information chunking with elements, and metadata labeling with attributes. In addition to these technical challenges, the implementation itself presents significant difficulties. This paper describes Scriptorium Publishing’s methodology for implementing structured authoring environments. This document is intended primarily as a roadmap for our clients, but it could be used as a starting point for any implementation.

Read More
Content strategy

The State of Structure

In early 2009, Scriptorium Publishing conducted a survey to measure how and why technical communicators are adopting structured authoring.

Of the 616 responses:

  • 29 percent of respondents indicated that they had already implemented structured authoring.
  • 16 percent indicated that they do not plan to implement structured authoring.
  • 14 percent were in the process of implementing structured authoring.
  • 20 percent were planning to do so.
  • 21 percent were considering it.
  • This report summarizes our findings on topics including the reasons for implementing structure, the adoption rate for DITA and other standards, and the selection of authoring tools.

    Download PDF file (2 MB, 56 pages)

    Discuss this document in our forum

Read More
Opinion

HTML 5: Browser Wars Reprise?

Recently, I ran across an article by Rob Cherny in Dr. Dobb’s Journal. He suggests that the added features in HTML 5 combined with an end to the development of XHTML point to a brighter standards-based future. He sees closed solutions like Flash, Silverlight, and JavaFX being supplanted directly by HTML 5 code. His view is that the web owes its success to standards.

It’s tempting to agree. Standards certainly allow for collaborative growth. Though I’m not the least bit convinced that collaborative growth is the foundation of the web’s success. I believe that the web’s incredible success is really traceable to the simplicity and flexibility of HTML. Each new version takes us further from that simplicity.

Through the browser war years we saw the impact of new features in HTML—incompatibility among browsers. My sense is that the success of Flash is largely due to the fact that Adobe owns both ends of the problem. They create the tools that generate Flash code as well as the viewer. Web developers can pretty much assume that what they see, when they build a Flash-based solution, is what the end user will see.

I fear that we will head right back to the bad old days if HTML 5’s complex capabilities are widely employed. I suspect that ‘wait and see’ will last a pretty long time. I have other concerns about HTML 5—more on that later. What do you think—will your organization take advantage of these new capabilities as soon as they are available?

Read More
Reviews

The long and winding roads from DITA to PDF

by Sheila Loring

DITA XML is of little use to readers unless it’s converted to some kind of output. The DITA Open Toolkit (DITA OT) provides transforms and scripts that convert DITA to PDF output and a long list of other formats.

Producing PDF output from DITA content can be challenging. DITA XML is converted to an XSL-FO file, a combination of content and formatting instructions. You must know XSL-FO to customize the PDF, even just to add simple content such as headers and footers, logos, and so on.

To forgo the programming, you can choose a page layout or help authoring tool, but these tools also have pitfalls. Page layout programs have varying degrees of DITA support. Help authoring tools let you style the PDF through CSS, but you can’t fine-tune page layout as you can in page layout programs.

These are just a few examples we discuss in our white paper “Creating PDF files from DITA content.” Read the white paper online (in HTML or PDF).

Read More
DITA

Automated trademarking in structured documents – DITA in particular

Unabashed plug warning: The following entry gives a conceptual overview of a solution Scriptorium has implemented for managing trademarks in structured tagging. And we’re proud of it.

You know the problem. According to your style standards, only the first instance of a given trademarked term should display the trademark symbol. Structured documentation allows you to re-use document parts (such as DITA topics) in just about any order you like. In Manual A, the first file containing the trademarked text is, say, Topic A; in Manual B the first file containing the trademarked text is Topic E, which is also used in Manual A. Where do you put your trademark markup, and how do you maintain it when running Manual A and Manual B at approximately the same time?

Maintaining the trademarks by hand adds a level of effort that becomes non-negligible when you start considering a large number of manuals. And the process becomes error prone – those darned human beings. Different writers might tag things different ways, trademarks might escape notice, or markup might be inserted in inappropriate places by accident.

Isn’t this one of those problems that automated documentation was supposed to solve, not create? I once had a professor who said that computers were supposed to handle the work that computers could solve so people could work on the problems that only people can solve.

More than one of Scriptorium’s customers has presented us with this problem, so we know it is not uncommon. We have found a way to deal with the problem in DITA, and we believe that the principle is sufficiently generic to use in non-DITA structures as well.

To begin with, forget conditional processing. It won’t help you with the problem of marking only the first instance of a term. In the example of Manual A, above, setting the condition “Manual A” would still display the trademark in Topic A and Topic E. This is not what your editor wants – and he or she will let you know it in spades if he or she is any kind of editor at all.

Scriptorium’s solution for DITA, in simple outline, is as follows:

  1. Using XSL, go through the ditamaps and remove all trademarking from the document files.

  2. Following a predefined list of trademarked and registered trademarked terms, go through the ditamaps and identify the files that contain each term. Create a temporary file that lists the relevant files in order of book occurrence. (This step prevents having to crawl through the ditamaps more than once.)

  3. Using Perl, iterate through the files listed for each term in the temporary file. Check the occurrence of each instance of the term, in text order, and evaluate whether it is a valid occurrence that requires trademarking. If so, wrap the appropriate trademark markup around it and go to the next trademark. If not, keep going through the text and the list of files until you find a valid occurrence of this trademark.

We possibly could have used XSL instead of Perl for the third step, but Perl’s text manipulation capability is much more robust than XSL’s, so we chose Perl.

In the implementation, the trademarking utility is coordinated by an Ant process. A user runs this utility just before the book is rendered for output. Being in Ant, the trademarking process could probably be integrated into the DITA Open Toolkit build system fairly easily to create a seamless, one-step production process.

There are a number of interesting problems that arise during implementation. For example, in step 3 the process has to evaluate whether the instance of a term is valid for trademarking. Some kinds of non-valid instances of a term in the text might be:

  • The term is in an indexterm tag.

  • The term is in an href attribute.

  • The term is in a title.

  • The term is in a codeblock tag.

You might also encounter a condition where a trademarked term could be both mixed case and all uppercase. Per your style guide, only the first instance of either should be marked, but not the first instance of both. That sort of requirement makes life just a little more interesting for a coder.

In general, the issue of trademarking first instances is not a simple problem to solve, and variations in style requirements will undoubtedly add complexity and challenges to the problem. But that’s what automated documentation is supposed to be good at, right? So we humans can get back to doing the more difficult problems that only people can solve.

I’m not sure – is that really such a good deal?

Read More
Reviews

Top five reasons to like XMetal and OXygen

by Sheila Loring

Full disclosure: We’re an XMetaL Services Provider and have no particular affiliation with oXygen.

I’m in the fortunate situation of having access to both XMetaL 5.5 and oXygen 9.3. Both are excellent XML editors for different reasons. I’d hate for Scriptorium to make me choose one over the other.

From the viewpoint of authoring XML and XSLT, here are my top five features of both editors:

oXygen

  • Apply XSLT on the fly: You can associate an XML file with an XSLT and transform the XML within oXygen. Goodbye, command line! XMetaL will convert the document to a selected output format. You don’t choose the XSLT–it hasn’t been a big concern for me.
  • Indented code: The pretty-print option makes working with code so easy. You can set oXygen to do this automatically when you open a file or on demand. The result is code indented according to the structure. XMetaL doesn’t have pretty print.
  • Autocompleting tags: As you type an element, oXygen pops up a list of elements beginning with the typed string. You press Enter when you find the right tag, and the end tag is inserted for you. The valid attributes at any particular point are also shown in a drop-down list. XMetaL doesn’t have autocompleting tags.
  • Find/replace in one or more documents: I’ve often needed to search and replace strings in an entire directory. In XMetaL, you can only find and replace in the current document.
  • Comparing two documents or directories: Compare files by content or timestamp. In a directory, you can even filter by type so only XML files, for example, are compared. XMetaL doesn’t offer this feature.

XMetaL

  • Auto-tagging content: You can copy and paste content from an unstructured document (a web page, for example), and XMetaL automatically wraps the content in elements. Even tables and lists are wrapped correctly. This can be handy if you have a few documents to convert. In oXygen, the content is pasted as plain text.
  • Auto-assignment of ID attributes: Never worry about coming up with unique IDs. XMetaL will assign them to the types of elements you select. Warning: The strings are quite long, as in “topic_BBEC2A36C97A4CADB130784380036FD6.” oXygen only inserts IDs on the top-level element but full support will be added in version 10.3.
  • Auto-insertion of basic elements: When you create a document, XMetaL inserts placeholders for elements such as title, shordesc, body, and p. It’s a small convenience. oXygen will also insert elements if you have Content Completion selected in the Preferences.
  • WYSIWYG view of tables: The table is displayed as you’d see it in a Word or FrameMaker document. In oXygen, all you see are the table element tags.
  • Reader-friendly tag view: The tags are a bit easier to read in XMetaL than oXygen. In XMetaL, the opening and closing tags are displayed on one line when possible. This feature saves space on the page and makes the document easier to read in tag view. For example, you might have a short sentence wrapped in p tags. In XMetal, the p tags are displayed on the same line. In oXygen, the p tags are always on separate lines. This is another convenience that doesn’t sound like a big deal, but it really makes a difference while you’re authoring.

oXygen and XMetal have so many other strengths. I’ve just chosen my top five features.

What I’d like to see in XMetaL: The ability to indent code, the ability to drag and drop topics in the map editor.
What’s I’d like to see in oXygen: The ability to view a table–lines and all–in the WYSIWYG view instead of just the element tags.

So how do I choose which editor to use at a particular moment? When I’m casually authoring in XML, I choose XMetaL for all of reasons you read above. The WYSIWYG view is more user-friendly to me. But when I’m writing XSLT or just want to get at the code of an XML document, oXygen is my choice.

Get the scoop on oXygen from http://oxygenxml.com. Read more about XMetaL at http://na.justsystems.com/index.php.

Update 6/15/09:
I’m thrilled to report that two deficiencies I reported in oXygen 9 are now supported in the latest version of oXygen — 10.2.

  • In Author view, tables are now displayed in WYSIWYG format. Just like in your favorite word processor, you can drag and drop column rulings to resize columns. After you resize columns, the colwidth attribute in the colspec element is updated automatically. This is much easier than manually editing the colwidth.
  • In Author view, the tags are now displayed on one line when possible. Before, the tags were always on separate lines from the content.

Two more reasons to love oXygen!

Read More
Opinion

A different take on Twittering and technical writers

by Sheila Loring

Technical writers abound on Twitter as do blog posts on how Twitter can make you a better tech writer.

I’d Rather Be Writing has an alternate take in the article Following the NBA Can Make You a Better Writer. Tom Johnson uses the analogy of Kobe Bryant and Lebron James playing their respective positions on the court. He argues that unless you’re a one-person shop, you’re doing yourself a disservice by trying to be a Jack- or Jill-of-all-trades. Play up your strengths, and minimize your weaknesses, tech writers. Read Tom’s article for more.

Read More
Reviews

Review of screen capture programs

by Sheila Loring

Matthew Ellison reviews seven screen capture programs: FullShot, HyperSnap, SnagIt, Madcap Capture, RoboScreen Capture, ScreenHunter (free), and TNT. He also points out what to look for in a screen capture tool and compares features in a handy table.

http://www.writersua.com/articles/capturetools/index.html

SnagIt lands at the top of the bunch. Matthew describes it as “the most full-featured of the capture tools reviewed in this article.”

I’m a recent SnagIt convert after using Paint Shop Pro for years. SnagIt can’t be beat for a quick, easy screen shot. I also like the torn edge options to indicate a partial shot of the GUI. But the jagged edges might be more of a creative device than helpful visual cue. What do you think?


Read More
Webcast

Essential tools of an XML workflow in the publishing industry

by Sheila Loring

Communications from DMN provided a link to a webcast on Essential Tools of an XML Workflow. The webcast focuses on the book publishing industry. It’s interesting to hear that some publishing houses still allow authors and editors to use Microsoft Word. These folks are often viewed as incapable of learning an XML authoring tool. Many times the Word content is sent to an indexer for tagging.

The companies I’ve worked with don’t give their employees the choice of publishing tools, but if you’re Stephen King, you probably won’t be forced to use an XML tool.

Technical writers, if you know how to work with XML, your skills are portable to publishing houses. Don’t overlook this in a job search.

http://toc.oreilly.com/2009/01/webcast-video-essential-tools.html

Read More