Skip to main content

Content management

Change management Content management Structured content

Interview with a vampire: interviewing to find processes that drain efficiency

When you’re considering an overhaul of your publishing workflow, you may find yourself becoming a metaphorical version of Van Helsing, the vampire-hunting character from Bram Stoker’s Dracula (and the many, many movies based on the Dracula story). You need to find all the efficiency-draining aspects of your current process and eliminate them.

Read More
Content management

Talk amongst yourselves…introducing forums.scriptorium.com

Our web site now has forums for discussions of technical communication issues. We want to give you, our readers, a venue where you can set your own agenda instead of just responding to our blog posts.

Given Scriptorium’s particular interests, I expect to see a lot of emphasis on publishing automation and XML. But frankly, we don’t know exactly what might happen. Communities often develop in unexpected ways. It will be up to you—and us—to figure out what direction these forums go.

(We have an internal pool on how long before Godwin’s law is applied.)

The forums are available in our main site navigation. There are also RSS feeds so you can subscribe to a topic or category of interest. Or, if you prefer, you can get email notifications for new forum posts.

And how do we feel about this launch? We’re…perfectly calm.

Please join the conversation.

Read More
Content management

The elephant in the room—publishers and e-books

Two years ago, Nate Anderson wrote this on ars technica:

The book business, though far older than the recorded music business, is still lucky enough to have time on its side: no e-book reader currently offers a better reading experience than paper.

That’s what makes Apple’s iPad announcement so important. Books will now face stiff competition from e-books as the e-book experience improves.

Elephant in the room // flickr: mobilestreetlife

Elephant in the room // flickr: mobilestreetlife

Meanwhile, the publishing industry (with the notable exception of O’Reilly Media) is desperately trying to avoid the inevitable. (For a slighty happier take, see BusinessWeek.)

Publishers are supposed to filter, edit, produce, distribute, and market content. pre-Internet, all of these things were difficult and required significant financial resources. Today, many are easy and all are cheap.

There’s only one other thing.

Content.

But the revenue split between publishers and authors does not—yet—reflect the division of labor. The business relationships are still built on the idea that authors can’t exist without publishers. In fact, it’s the reverse that’s true.

Only the big publishers can get your book into every bookstore in the country. However, I’ve got news for you: Unless your name is on an elite shortlist with the likes of Dan Brown, John Grisham, Nora Roberts, and J.K. Rowling, it probably doesn’t matter.

If you know your audience, you can reach them at least as well as a big publisher can. And you need to reach a lot fewer people to succeed as an independent. The general rule of thumb is a 10-to-1 ratio. You’ll make the same amount selling 10,000 books through a traditional publisher as 1,000 books on your own.

It’s not so difficult to hire freelancers (especially in this economy) to edit and produce your book, if that’s not your cup of tea. Distribution is doable—Amazon is easy, bookstores a little more challenging. This is where e-books will accelerate the change—the challenges of shelf space and returns simply disappear.

And even if you have a publisher, they will expect you to do most of the marketing.

So, what will successful publishers look like in 2020?

  • They will provide editorial and production support for writers who do not want to deal with technical issues.
  • They will support authors in marketing by helping them with blogging platforms and other social media efforts.
  • They will get a much smaller cut of revenues than they currently do.

Actually, that looks a lot like Lulu.

    Read More
    Content management

    ePub + tech pub = ?

    At Scriptorium earlier this week, we all watched live blogs of the iPad announcement. (What else would you expect from a bunch of techies?) One feature of the iPad that really got us talking (and thinking) is its support of the ePub open standard for ebooks.

    ePub is basically a collection of XHTML files zipped up with some baggage files. Considering a lot of technical documentation groups create HTML output as a deliverable, it’s likely not a huge step further to create an ePub version of the content. There is a transform for DocBook to ePub; there is a similar effort underway for DITA. You can also save InDesign files to ePub.

    While the paths to creating an ePub version seem pretty straightforward, does it make sense to release technical content as an ebook? I think a lot of the same reasons for releasing online content apply (less tree death, no printing costs, and interactivity, in particular), but there are other issues to consider, too: audience, how quickly ebook readers and software become widespread, how the features and benefits of the format stack up against those of PDF files and browser-based help, and so on. And there’s also the issue of actually leveraging the features of an output instead of merely doing the minimum of releasing text and images in that format. (In the PDF version of a user manual, have you ever clicked an entry in the table of contents only to discover the TOC has no links? When that happens, I assume the company that released the content was more interested in using the format to offload the printing costs on to me and less interested in using PDF as a way to make my life easier.)

    The technology supporting ebooks will continue to evolve, and there likely will be a battle to see which ebook file format(s) will reign supreme. (I suspect Apple’s choice of the ePub format will raise that format’s prospects.) While the file formats get shaken out and ebooks continue to emerge as a way to disseminate content, technical communicators would be wise to determine how the format could fit into their strategies for getting information to end users.

    What considerations come to your mind when evaluating the possibility of releasing your content in ePub (or other ebook) format?

    Read More
    Content management DITA

    White paper on whitespace (and removing it)

    When I first started importing DITA and other XML files into structured FrameMaker, I was surprised by the excessive whitespace that appeared in the files. Even more surprising (in FrameMaker 8.0) were the red comments displayed via the EDD that said that some whitespace was invalid (these no longer appear in FrameMaker 9).

    The whitespace was visible because of an odd decision by Adobe to handle all XML whitespace as if it were significant. (XML divides the world into significant and insignificant whitespace; most XML tools treat whitespace as insignficant except where necessary…think <codeblock> elements). This approach to whitespace exists in both FrameMaker and InDesign.

    At first I handled the whitespace on a case-by-case basis, removing it by hand or through regular expressions. Eventually, I realized this was a more serious problem and created an XSL transform to eliminate the white space as a part of preprocessing. By using XSL that was acceptable to Xalan (not that hard), the transform can be integrated into a FrameMaker structured application.

    I figured this whitespace problem must be affecting (and frustrating) more than a few of you out there,
    so I made the stylesheet available on the Scriptorium web site. I also wrote a white paper “Removing XML whitespace in structured FrameMaker documents” that describes describes the XSL that went into the stylesheet and how to integrate it with your FrameMaker structured applications.

    The white paper is available on the Scriptorium web site. Information about how to download the stylesheet is in the white paper.

    If the stylesheet and whitepaper are useful to you, let us know!

    Read More
    Content management

    Adding a DOCTYPE declaration on XSL output

    In a posting a few weeks ago I discussed how to ignore the DOCTYPE declaration when processing XML through XSL. What I left unaddressed was how to add the DOCTYPE declaration back to the files. Several people have told me they’re tired of waiting for the other shoe to drop, so here’s how to add a DOCTYPE declaration.

    First off: the easy solution. If the documents you are transforming always use the same DOCTYPE, you can use the doctype-public and doctype-system attributes in the <xsl:output> directive. When you specify these attributes, XSL inserts the DOCTYPE automatically.

    However, if the DOCTYPE varies from file to file, you’ll have to insert the DOCTYPE declaration from your XSL stylesheet. In DITA files (and in many other XML architectures), the DOCTYPE is directly related to the root element of the document being processed. This means you can detect the name of the root element and use standard XSL to insert a new DOCTYPE declaration.

    Before you charge ahead and drop a DOCTYPE declaration into your files, understand that the DOCTYPE declaration is not valid XML. If you try to emit it literally, your XSL processor will complain. Instead, you’ll have to:

    • Use entities for the less-than (“<” – “&lt;”) and greater-than (“>” – “&gt;”) signs, and
    • Disable output escaping so that the entities are actually emitted as less-than or greater-than signs (output escaping will convert them back to entities, which is precisely what you don’t want).

    There are at least two possible approaches for adding DOCTYPE to your documents: use an <xsl:choose> statement to select a DOCTYPE, or construct the DOCTYPE using the XSL concat() function.

    To insert the DOCTYPE declaration with an <xsl:choose> statement, use the document’s root element to select which DOCTYPE declaration to insert. Note that the entities “&gt;” and “&lt;” aren’t HTML errors in this post, they are what you need to use. Also note that the DOCTYPE statement text in this template is left-aligned so that the output DOCTYPE declarations will be left aligned. Most parsers seem to tolerate whitespace before the DOCTYPE declaration, but I prefer to err on the side of caution:


    &lt;xsl:template match="/"&gt;
    &lt;xsl:choose&gt;
    &lt;xsl:when test="name(node()[1]) = 'topic'"&gt;
    &lt;xsl:text disable-output-escaping="yes"&gt;
    &lt;!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"&gt;
    &lt;/xsl:text&gt;
    &lt;/xsl:when&gt;
    &lt;xsl:when test="name(node()[1]) = 'concept'"&gt;
    &lt;xsl:text disable-output-escaping="yes"&gt;
    &lt;!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"&gt;
    &lt;/xsl:text&gt;
    &lt;/xsl:when&gt;
    &lt;xsl:when test="name(node()[1]) = 'task'"&gt;
    &lt;xsl:text disable-output-escaping="yes"&gt;
    &lt;!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd"&gt;
    &lt;/xsl:text&gt;
    &lt;/xsl:when&gt;
    &lt;xsl:when test="name(node()[1]) = 'reference'"&gt;
    &lt;xsl:text disable-output-escaping="yes"&gt;
    &lt;!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "reference.dtd"&gt;
    &lt;/xsl:text&gt;
    &lt;/xsl:when&gt;
    &lt;/xsl:choose&gt;
    &lt;xsl:apply-templates select="node()"/&gt;
    &lt;/xsl:template&gt;

    The preceding example contains statements for the topic, concept, task, and reference topic types; if you use other topic types, you’ll need to add additional statements. Rather than write a statement for each DOCTYPE, a more general approach is to process the name of the root element and construct the DOCTYPE declaration using the XSL concat() function.


    &lt;xsl:variable name="ALPHA_UC" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/&gt;
    &lt;xsl:variable name="ALPHA_LC" select="'abcdefghijklmnopqrstuvwxyz'"/&gt;
    &lt;xsl:variable name="NEWLINE" select="'&amp;#x0A;'"/&gt;

    &lt;xsl:template match="/"&gt;
    &lt;xsl:call-template name="add-doctype"&gt;
    &lt;xsl:with-param name="root" select="name(node()[1])"/&gt;
    &lt;/xsl:call-template&gt;
    &lt;xsl:apply-templates select="node()"/&gt;
    &lt;/xsl:template&gt;

    <span style="color: green;">&lt;-- Create a doctype based on the root element --&gt;</span>
    &lt;xsl:template name="add-doctype"&gt;
    &lt;xsl:param name="root"/&gt;
    <span style="color: green;">&lt;-- Create an init-cap version of the root element name. --&gt;</span>
    &lt;xsl:variable name="initcap_root"&gt;
    &lt;xsl:value-of
    select="concat(translate(substring($root,1,1),$ALPHA_LC,$ALPHA_UC),
    translate(substring($root,2 ),$ALPHA_UC,$ALPHA_LC))"
    /&gt;
    &lt;/xsl:variable&gt;
    <span style="color: green;">&lt;-- Build the DOCTYPE by concatenating pieces.</span>
    <span style="color: green;">Note that XSL syntax requires you to use the &amp;quot; entities for</span>
    <span style="color: green;">quotation marks ("). --&gt;</span>

    &lt;xsl:variable name="doctype"
    select="concat('!DOCTYPE ',
    $root,
    ' PUBLIC &amp;quot;-//OASIS//DTD DITA ',
    $initcap_root,
    '//EN&amp;quot; &amp;quot;',
    $root,
    '.dtd&amp;quot;') "/&gt;
    &lt;xsl:value-of select="$NEWLINE"/&gt;
    <span style="color: green;">&lt;-- Output the DOCTYPE surrounded by &lt; and &gt;. --&gt;</span>
    &lt;xsl:text disable-output-escaping="yes"&gt;&lt;
    &lt;xsl:value-of select="$doctype"/&gt;
    &lt;xsl:text disable-output-escaping="yes"&gt;&gt;
    &lt;xsl:value-of select="$NEWLINE"/&gt;
    &lt;/xsl:template&gt;

    The one caveat about this approach is that it depends on a consistent portion of the public ID form (“-//OASIS//DTD DITA “). If there are differences in the public ID for your various DOCTYPE declarations, those differences may complicate the template.

    So there you have it: DOCTYPEs in a flash. Just remember to use disable-output-escaping=”yes” and use entities where appropriate and you’ll be fine.

    Read More