Author Archive

Webcast: Transition to XML

April 17th, 2012 by Simon Bate

Simon Bate provides a planning framework for implementing an XML-based structured authoring environment.

(more…)



From Boulder to Bangalore

March 5th, 2012 by Simon Bate

When I was a high school student in Boulder, Colorado, my first job was as a stock boy in an India-imports store. The store, Hamara Dukan, stocked all sorts of handicrafts and objets d’art from India including clothing, wood carvings, brass bowls and knickknacks, hand-printed bedspreads, incense, Kashmiri boxes, and thousands of other items. After working there for a couple of years, I acquired an appreciation of the things the country produced, but was always curious about the people and what it was like to be in India.
(more…)



Webcast: Localization and the DITA Open Toolkit

September 14th, 2011 by Simon Bate

Out of the box, the DITA Open Toolkit (OT) looks like it’s localization-ready. It handles the XML attribute xml:lang. It contains strings for more than 50 localizations. So it would seem that all you have to do is specify the language in your DITA files and maps and you’re good to go…or are you? In this webcast, I’ll discuss some of the issues Scriptorium has encountered while generating localized output from the DITA Open Toolkit—and how we solved them. (more…)



Localization and the DITA Open Toolkit

August 22nd, 2011 by Simon Bate

Utilisez-vous le DITA Open Toolkit? Используете ли вы DITA Open Toolkit? あなたは、DITA Open Toolkitのを使用していますか?

In the past year or so we have handled a number of projects that involved localization and the DITA Open Toolkit. (more…)



Localization and the DITA Open Toolkit

August 22nd, 2011 by Simon Bate

Read in PDF PDF file (205 KB, 15 pages)

Out of the box, the DITA Open Toolkit (OT) looks like it’s localization-ready. It handles the XML attribute xml:lang. It contains strings for more than 50 localizations. So it would seem that all you have to do is specify the language in your DITA files and maps and you’re good to go…or are you? (more…)



Webcast: DITA OT essentials

July 27th, 2011 by Simon Bate

In this webcast, Simon Bate provides a “gentle introduction” to the DITA Open Toolkit (OT), the standard way to generate deliverables from DITA documents. This presentation shows how anyone can install the OT. A tour of the contents and how the plugin architecture works is included. (more…)



FrameMaker, DITA, xrefs: I could tell you, but…

July 8th, 2011 by Simon Bate

Modifying FrameMaker cross-reference formats: it’s basic and one of the cool things about FrameMaker. But not if you’re editing DITA files using FrameMaker 9 or 10.

(more…)



Inside the DITA Open Toolkit

May 18th, 2011 by Simon Bate

The DITA Open Toolkit (OT) is the standard way to generate deliverables from DITA documents. To the casual user, the OT may seem intimidating, but in reality, it’s fairly easy to download, install, and run.  (more…)



Early ExtendScript Experiences

May 4th, 2011 by Simon Bate

FrameScript has long been our go-to tool for FrameMaker automation. (more…)



Webhelp for DITA

February 11th, 2011 by Simon Bate

A few weeks ago my esteemed colleague, David Kelly, published a blog post about his DITA Open Toolkit (OT) plugin that simplifies the customization of PDF output. In the post, David mentioned that I would soon be writing about a plugin to provide HTML-based web help. I found a bit of time to write about Scriptorium Help, so here is that post.

(more…)



Conquering the Mark of the Web (DITA OT version)

December 22nd, 2010 by Simon Bate

Whew! Now I know how St. George felt after slaying the dragon. I’ve defeated the Mark of the Web beast and have lived to tell about it. (more…)



Webcast: XSL techniques for XML-to-XML transformations

October 6th, 2010 by Simon Bate

In this webcast, Simon Bate leads viewers through the key steps in using XSL (extensible stylesheet language) to perform XML-to-XML conversions, a process that differs from more traditional XML-to-PDF and XML-to-HTML conversions.

(more…)



Using Ant to find a needle in a haystack

July 30th, 2010 by Simon Bate

Many content management systems (CMSs) take over the responsibility of file naming. For the most part, this is fine and is actually necessary for maintaining cross-references and conrefs within the CMS. When you use the CMS to build a DITA map, the CMS uses its own names in the <topicref> elements. (more…)



Tech Tips: Quick Word to DITA table conversion

July 21st, 2010 by Simon Bate

The other day I had to convert a large table from Word to DITA. I started looking at Word XML output and thought about transforming it with XSL (which I have done in the past), but that seemed to be too much trouble for this document. Then I remembered a technique an old SQL coder showed me for loading large amounts of data into a SQL table.  I realized this technique could be readily adapted to DITA. (more…)



Webcast: DITA Specialization 101

July 1st, 2010 by Simon Bate

Simon Bate of Scriptorium Publishing introduces specialization in the DITA open toolkit and walks viewers through the fundamentals.
(more…)



Webcast: XMetaL 6 overview

April 21st, 2010 by Simon Bate

JustSystems XMetaL 6.0 adds a handful of new features for DITA users. The most interesting addition is a tripane Help output format (which JustSystems calls “WebHelp”). The main focus of this webcast is on customizing the appearance of WebHelp. The topics include: modifying navigation panes, changing fonts, and adding footers and headers.
(more…)



A Bit Retro: Top Ten FrameMaker Conversion Tips

March 15th, 2010 by Simon Bate

A few months back Samartha Vashishtha from the Adobe India office created a PDF of his “Top Ten FrameMaker Tips.” Most of the tips were for using the newer features in FrameMaker 9.0. As I read the article, I thought about my own Top Ten FrameMaker tips (regardless of version).

Once I had my list written down, I realized that many of my tips relate to converting documents from other formats to FrameMaker. In the light of DITA and other newer document encoding schemes, it felt a bit retro to be posting the list. After all, some of Scriptorium’s business involves converting documents from FrameMaker.

However, I also realize that there are many organizations using FrameMaker, and those groups still need to import documents from various formats to FrameMaker. (And I’m always surprised by the number of people who say “I wish I had known about #1 a long time ago.”)

So, here we have it: Simon’s Top Ten FrameMaker Conversion tips:

  1. Improve pasting from MS Word. When copying from MS Word and pasting to out-of-the-box FrameMaker, the text is pasted as an annoying OLE object; you have to use the equally annoying Edit > Paste special to paste as text. Relax; there’s a fix. Find your maker.ini file (in Program Files\Adobe\FrameMaker x), make a copy of the file (just in case you mess something up), edit maker.ini using a text editor (such as Notepad), find the line that begins with “ClipboardFormatsPriorities=” and move the word “TEXT” to the beginning of the list (so the beginning of the line reads “ClipboardFormatsPriorities=TEXT, FILE, …”). Save the file, then restart FrameMaker. Now the text is inserted as text when you paste. It’s a miracle!
  2. Copy and paste isn’t just for text. You can copy the paragraph or character style of the current text selection using Edit > Copy Special > Paragraph or Edit > Copy Special > Character (that’s ESC eyp or ESC eyc, if you’re an Escape code kind of person). To apply the copied style, select some text and use paste (CTRL+v). With a style on the clipboard, you can quickly scroll through a document and apply the style where it’s needed. In addition to paragraph and character styles, you can copy conditional text settings, table column widths (see next), and attributes (structured FrameMaker).
  3. Copy table column widths To copy column widths, use Edit > Copy Special > Table Column Width (ESC eyw). This can be applied in two ways:

    When converting documents, I often encounter tables that are so wide many of the columns don’t appear on the page. The fast solution is to make the first column relatively narrow, copy the first column width, select the entire table, and paste. The single width is applied to all columns in the table. With any luck, the table will now fit on the page and you can resize the columns as needed. (Yes, you can use the Table > Resize Columns command; I just find this to be faster.)

    The second great use of copy column width is that it allows you to apply a consistent set of column widths to a number of tables. The only downside is that copy column width can store only one column width at a time. You have to copy the width of the first column, apply it to all first columns, then copy the width of the second column, apply it, and so on.

  4. There’s power in Change By Pasting. The Copy Special features become incredibly powerful when you combine them with Change: By Pasting in the Find/Change dialog. To quickly apply a character style to a frequently occurring string: search for the string, use the standard methods to apply a new style to the text, then copy the style with Edit > Copy Special > Character Format (ESC eyc). In the Find/Change dialog, set the Change field to “By Pasting.” Now search for the next instance of the string and use the Change button to apply the change (as a test). After a couple more tests, use Change All to change the rest of the occurrences.
  5. Use SHIFT when resizing table columns. At some point in your document conversion you’ll need to adjust your table columns. Normally, resizing a column width using the mouse causes all columns to the right to move a corresponding distance (and causes the table width to change). To move a column border between two columns without affecting the rest of the table, hold the SHIFT key down before clicking on the column resize handles. Now when you drag, you only change the width of the column and its neighbor to the right; the overall table width remains unchanged.
  6. Search for “\P\p” to eliminate blank lines. It’s inevitable: when you’re converting a document from Microsoft Word, you’re going to find that someone used blank paragraphs to create vertical space. But how do you search and replace blank lines without messing up your formatting? If you search for “\p\p” and replace it with “\p”, you’ll eliminate the blank line, but the first paragraph acquires the formatting of the following paragraph, which is not good. Instead, search for “\P\p” and replace it with “” (an empty string). The capital “\P” means beginning of a paragraph and lowercase “\p” means the end of paragraph.
    Note that you might also need to make a second pass, turning on wildcards and searching for “\P|\p”. With wildcards enabled, the vertical bar means any number of spaces or punctuation.
  7. Use File > Import > Formats to remove manual page breaks. You may have noticed the Remove Manual Page Breaks checkbox in the Import Formats dialog. Well, it works even when you’re “importing”
    formats from a file onto itself. To start afresh on a pagination pass, use File > Import > Formats (or ESC fio), make sure Import from Document is set to “Current”, clear all Import and Update checkboxes, check the Remove Manual Page Breaks check box, and click Import. If you’re daring (or very confident), you can use the same technique to clear formatting overrides.
  8. Drag and drop works at the file level. A fast way to build FrameMaker books is to drag and drop the individual FrameMaker files from Windows Explorer into FrameMaker book file. For even faster building, you can select several FrameMaker files in a folder and drag them all at once into the FrameMaker book. One the files are in the book file, use drag and drop to rearrange them as you need.
  9. MIF stabilizes documents. When importing large Microsoft Word documents with many images, the resulting FrameMaker file can sometimes be unstable. You can eliminate a number of problems by immediately saving as MIF, closing the file, re-opening the MIF file, and saving as a FrameMaker file. Saving as MIF is often a good first step for correcting corrupt FrameMaker documents.
  10. Learn about FrameMaker add-ons. For a relatively small amount of money, FrameMaker add-ons add real power to FrameMaker. Some of my favorites are:

I’m sure as soon as I hit the “Post” button, I’ll think of number 11…or 12. If I do, I’ll let you know.

What are your favorite FrameMaker conversion tricks?



Webcast: DITA features in oXygen XML editor

February 17th, 2010 by Simon Bate

This webcast offers an overview of the oXygen XML editor and demonstrates DITA-specific features, including inserting cross-references and conrefs, working with map files, applying conditions, and generating output.
(more…)



White paper on whitespace (and removing it)

January 15th, 2010 by Simon Bate

When I first started importing DITA and other XML files into structured FrameMaker, I was surprised by the excessive whitespace that appeared in the files. Even more surprising (in FrameMaker 8.0) were the red comments displayed via the EDD that said that some whitespace was invalid (these no longer appear in FrameMaker 9).

The whitespace was visible because of an odd decision by Adobe to handle all XML whitespace as if it were significant. (XML divides the world into significant and insignificant whitespace; most XML tools treat whitespace as insignficant except where necessary…think <codeblock> elements). This approach to whitespace exists in both FrameMaker and InDesign.

At first I handled the whitespace on a case-by-case basis, removing it by hand or through regular expressions. Eventually, I realized this was a more serious problem and created an XSL transform to eliminate the white space as a part of preprocessing. By using XSL that was acceptable to Xalan (not that hard), the transform can be integrated into a FrameMaker structured application.

I figured this whitespace problem must be affecting (and frustrating) more than a few of you out there,
so I made the stylesheet available on the Scriptorium web site. I also wrote a white paper “Removing XML whitespace in structured FrameMaker documents” that describes describes the XSL that went into the stylesheet and how to integrate it with your FrameMaker structured applications.

The white paper is available on the Scriptorium web site. Information about how to download the stylesheet is in the white paper.

If the stylesheet and whitepaper are useful to you, let us know!



Removing XML whitespace in structured FrameMaker documents

January 15th, 2010 by Simon Bate

An ongoing frustration with using structured FrameMaker to generate great PDF files from DITA or other XML files is that structured FrameMaker does not ignore whitespace, resulting in excess spaces in paragraphs and table cells and unnecessary space between paragraphs. These spaces are annoying and time consuming to remove. This paper describes an XSL transform that removes whitespace from XML documents. You can incorporate the transform into a FrameMaker structured application to remove whitespace automatically.

(more…)



Hacking the DITA Open Toolkit

December 31st, 2009 by Simon Bate

Hacking the DITA Open Toolkit

The Darwin Information Typing Architecture (DITA) defines a set of XML elements for creating and organizing content. However, the DITA specification is silent on transforming DITA into
user-readable -documentation. The DITA Open Toolkit (DITA OT) fills that gap, providing a mechanism for transforming DITA content into multiple output formats, including HTML and PDF. The DITA OT formatting for both of these formats is basic, at best. This paper focuses on the changes you can make to the DITA OT HTML output to create attractive output. These modifications include changes to cascading stylesheets (CSS), headers and footers, and more advanced customizations. The paper also illustrates how you can create -content-specific elements through DITA specialization.

(more…)



Webcast: Hacking the DITA Open Toolkit

December 31st, 2009 by Simon Bate

The DITA specification is silent on how to transform DITA into user-readable documentation. The DITA Open Toolkit (DITA OT) fills that gap, providing a mechanism for transforming DITA content into multiple output formats, including HTML and PDF. The DITA OT formatting for both of these formats is basic, at best. Usually people want more from the output: they want it to be more attractive or conform to their corporate look and feel (or both).
(more…)



Webcast: Dynamic text display: a space-saving alternative to conditional processing

December 31st, 2009 by Simon Bate

We needed to generate a Help set from DITA sources that applied to multiple products. However, serious space constraints prevent us from using standard DITA conditional processing to create multiple, product-specific versions of the Help; there was only room for one copy of the Help. Our solution was to create a single Help set in which select content would be displayed when the Help was opened.

In this webcast, we’ll show you how we used the DITA Open Toolkit to create a Help set with dynamic text display. The webcast introduces some minor DITA Open Toolkit modifications and several client-side JavaScript techniques that you can use to implement dynamic text display in HTML files. Minimal programming skills necessary. Simon Bate, Senior Technical Consultant will show you what to modify and how to do it.



Adding a DOCTYPE declaration on XSL output

December 1st, 2009 by Simon Bate

In a posting a few weeks ago I discussed how to ignore the DOCTYPE declaration when processing XML through XSL. What I left unaddressed was how to add the DOCTYPE declaration back to the files. Several people have told me they’re tired of waiting for the other shoe to drop, so here’s how to add a DOCTYPE declaration.

First off: the easy solution. If the documents you are transforming always use the same DOCTYPE, you can use the doctype-public and doctype-system attributes in the <xsl:output> directive. When you specify these attributes, XSL inserts the DOCTYPE automatically.

However, if the DOCTYPE varies from file to file, you’ll have to insert the DOCTYPE declaration from your XSL stylesheet. In DITA files (and in many other XML architectures), the DOCTYPE is directly related to the root element of the document being processed. This means you can detect the name of the root element and use standard XSL to insert a new DOCTYPE declaration.

Before you charge ahead and drop a DOCTYPE declaration into your files, understand that the DOCTYPE declaration is not valid XML. If you try to emit it literally, your XSL processor will complain. Instead, you’ll have to:

  • Use entities for the less-than (“<” – “&lt;”) and greater-than (“>” – “&gt;”) signs, and
  • Disable output escaping so that the entities are actually emitted as less-than or greater-than signs (output escaping will convert them back to entities, which is precisely what you don’t want).

There are at least two possible approaches for adding DOCTYPE to your documents: use an <xsl:choose> statement to select a DOCTYPE, or construct the DOCTYPE using the XSL concat() function.

To insert the DOCTYPE declaration with an <xsl:choose> statement, use the document’s root element to select which DOCTYPE declaration to insert. Note that the entities “&gt;” and “&lt;” aren’t HTML errors in this post, they are what you need to use. Also note that the DOCTYPE statement text in this template is left-aligned so that the output DOCTYPE declarations will be left aligned. Most parsers seem to tolerate whitespace before the DOCTYPE declaration, but I prefer to err on the side of caution:

    <xsl:template match="/">
        <xsl:choose>
          <xsl:when test="name(node()[1]) = 'topic'">
              <xsl:text disable-output-escaping="yes">
&lt;!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"&gt;
</xsl:text>
          </xsl:when>
          <xsl:when test="name(node()[1]) = 'concept'">
              <xsl:text disable-output-escaping="yes">
&lt;!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"&gt;
</xsl:text>
          </xsl:when>
          <xsl:when test="name(node()[1]) = 'task'">
              <xsl:text disable-output-escaping="yes">
&lt;!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd"&gt;
</xsl:text>
          </xsl:when>
          <xsl:when test="name(node()[1]) = 'reference'">
              <xsl:text disable-output-escaping="yes">
&lt;!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "reference.dtd"&gt;
</xsl:text>
          </xsl:when>
        </xsl:choose>
        <xsl:apply-templates select="node()"/>
    </xsl:template>

The preceding example contains statements for the topic, concept, task, and reference topic types; if you use other topic types, you’ll need to add additional statements. Rather than write a statement for each DOCTYPE, a more general approach is to process the name of the root element and construct the DOCTYPE declaration using the XSL concat() function.

    <xsl:variable name="ALPHA_UC" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
    <xsl:variable name="ALPHA_LC" select="'abcdefghijklmnopqrstuvwxyz'"/>
    <xsl:variable name="NEWLINE" select="'&amp;#x0A;'"/>

    <xsl:template match="/">
        <xsl:call-template name="add-doctype">
            <xsl:with-param name="root" select="name(node()[1])"/>
        </xsl:call-template>
        <xsl:apply-templates select="node()"/>
    </xsl:template>

    <-- Create a doctype based on the root element -->
    <xsl:template name="add-doctype">
        <xsl:param name="root"/>
        <-- Create an init-cap version of the root element name. -->
        <xsl:variable name="initcap_root">
            <xsl:value-of
                select="concat(translate(substring($root,1,1),$ALPHA_LC,$ALPHA_UC),
                               translate(substring($root,2  ),$ALPHA_UC,$ALPHA_LC))"
            />
        </xsl:variable>
        <-- Build the DOCTYPE by concatenating pieces.
            Note that XSL syntax requires you to use the &quot; entities for
            quotation marks (“). -->
        <xsl:variable name="doctype"
            select="concat('!DOCTYPE ',
                          $root,
                           ' PUBLIC &amp;quot;-//OASIS//DTD DITA ',
                           $initcap_root,
                           '//EN&amp;quot; &amp;quot;',
                           $root,
                           '.dtd&amp;quot;') "/>
        <xsl:value-of select="$NEWLINE"/>
        <-- Output the DOCTYPE surrounded by < and >. -->
<xsl:text disable-output-escaping="yes">&lt;</xsl:text>
<xsl:value-of select="$doctype"/>
<xsl:text disable-output-escaping="yes">&gt;</xsl:text>
        <xsl:value-of select="$NEWLINE"/>
    </xsl:template>

The one caveat about this approach is that it depends on a consistent portion of the public ID form ("-//OASIS//DTD DITA "). If there are differences in the public ID for your various DOCTYPE declarations, those differences may complicate the template.

So there you have it: DOCTYPEs in a flash. Just remember to use disable-output-escaping="yes" and use entities where appropriate and you'll be fine.



Ignoring DOCTYPE in XSL Transforms using Saxon 9B

September 23rd, 2009 by Simon Bate

Recently I had to write some XSL transforms in which I wanted to ignore the DOCTYPE declarations in the source XML files. In one case, I didn’t have access to the DTD (and the files wouldn’t have validate even if I did). In the other case, the XML files were DITA files, but I had no need or interest in validating the files; I simply needed to run a transform that modified some character data in the files.

In the first case, I ended up writing a couple of SED scripts that removed and re-inserted the DOCTYPE declaration. By the time I encountered the second case, I wanted to do something less ham-fisted, so I started investigating how to direct Saxon to ignore the DOCTYPE declaration.

My first thought was to use the -x switch in Saxon. Perhaps I didn’t use it correctly, but I couldn’t get it to work. Even though I was using a non-validating parser (Piccolo), Saxon kept telling me that the DTD couldn’t be found.

I went back to the drawing board (aka Google) and found a note from Michael Kay that said, “to ignore the DTD completely, you need to use a catalog that redirects the DTD reference to some dummy DTD.” Michael provided a link to a very useful page in the Saxon Wiki that discussed using a catalog with Saxon. After a bit of experimentation, I got it working correctly. In this blog post, I’ve distilled the information to make it useful to others who need to ignore the DOCTYPE in their XSL.

Before I describe the catalog implementation, I’d like to point out a simple solution. This solution works best when a set of XML files are in a single directory and all files use the same DOCTYPE declaration in which the system ID specifies a file:

&lt;!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"&gt;

In this case, you don’t need a catalog. It’s easier to create an empty file named “topic.dtd” (a dummy DTD) and save it in the same directory as the XML files. The XML parser looks first for the system ID; if it finds a DTD file, it uses it. Case closed.

However, there are many cases in which this simple solution doesn’t work. The system ID (“topic.dtd” in the previous example) might specify a path that cannot be reproduced on your machine…or the XML files could be spread across multiple directories…or there could be many different DOCTYPEs…or…

In these cases, it makes more sense to set up a catalog file. To specify a catalog with Saxon, you must use the XML Commons Resolver from Apache (resolver.jar). You can download the resolver from SourceForge. The good thing is, if you have the DITA Open Toolkit installed on your machine, you already have a copy of the resolver.jar file. The file is in %DITA-OT%\lib\resolver.jar. You specify the class path for the resolver in the Java command using the -cp switch (shown below).

The resolver requires you to specify a catalog.xml file, in which you map the the public ID (or system ID) in the DOCTYPE declaration to a local DTD file. The catalog.xml file I created looks like this:

&lt;catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
    &lt;public publicId="-//OASIS//DTD DITA Topic//EN" uri="dummy.dtd"/&gt;
    &lt;public publicId="-//OASIS//DTD DITA Concept//EN" uri="dummy.dtd"/&gt;
    &lt;public publicId="-//OASIS//DTD DITA Task//EN" uri="dummy.dtd"/&gt;
    &lt;public publicId="-//OASIS//DTD DITA Reference//EN" uri="dummy.dtd"/&gt;
&lt;/catalog&gt;

Note that the uri attribute in each entry points to a dummy DTD (an empty file). The file path used for the dummy.dtd file is relative to the location of the catalog file.

Putting it all together, I created a DOS batch file to run Java and invoke Saxon:

java -cp c:\saxon9\saxon9.jar;C:\DITA-OT1.4.3\lib\resolver.jar ˆ
   -Dxml.catalog.files=catalog.xml ˆ
    net.sf.saxon.Transformˆ
   -r:org.apache.xml.resolver.tools.CatalogResolver ˆ
   -x:org.apache.xml.resolver.tools.ResolvingXMLReader ˆ
   -y:org.apache.xml.resolver.tools.ResolvingXMLReader ˆ
   -xsl:my_transform.xsl ˆ
   -s:my_content.xml

The Java -cp switch adds class paths for the saxon.jar and resolver.jar files. The -D switch sets the system property xml.catalog.files to the location of the catalog.xml file.

The switches following the Java class (net.sf.saxon.Transform) are Saxon switches.

  • -r – class of the resolver
  • -x – class of the source file parser
  • -y – class of the stylesheet parser

Note, I’m using Windows (DOS) syntax here. If you are using Unix (Linux, Mac), separate the paths in the class path with a colon (:) and use the backslash (\) as a line continuation character.

When you run Saxon this way, you’ll notice two things: first, Saxon doesn’t complain about the DTD (yay!), but secondly, there is no DOCTYPE declaration in the output. I’ll address how to add the DOCTYPE declaration back to the output XML file in my next blog post.



Converting MS Word to DITA

August 31st, 2009 by Simon Bate

This is a short reminder that it’s not too late to sign up for our webinar on converting documents from Microsoft Word to DITA.  This webinar is the second in our series of webinars co-presented with JustSystems on “Things to Consider When Moving to DITA.”

To sign up,  follow this link to the registration page.

If you missed last week’s webinar on converting unstructured FrameMaker documents to DITA, you can view a recording by visiting the JustSystems webinars page (at http://na.justsystems.com/webinars.php), then scroll down to the “Archived Webinars” section.

Hope to “see” you tomorrow!



New Webinar Series: Things to consider when moving to DITA

August 12th, 2009 by Simon Bate

Scriptorium and JustSystems are announcing a three-webinar series on preparing to use DITA.

The first two webinars in the series describe the age-old problem of converting legacy content into DITA. Because a great deal of unstructured content is in either Adobe FrameMaker and Microsoft Word, we’re dedicating one webinar to converting Unstructured FrameMaker to DITA and the other to converting Microsoft Word to DITA.

The third webinar describes various re-use strategies you can apply to your DITA content.

The dates and times for the conversion webinars are:

  • Converting Unstructured FrameMaker to DITA – August 25, 2:00pm Eastern time.
  • Converting Microsoft Word to DITA – September 1, 2:00pm Eastern time.

The date and time for the third webinar (DITA reuse strategies) will be announced toward the end of August.

All of the webinars in the series are free, but you do have to register before attending. To sign up, follow this link to the JustSystems web site:

http://na.justsystems.com/webinars.php

Register now!



Don’t type, drag to the cmd window

February 20th, 2009 by Simon Bate

I spend a good deal of time with a Windows cmd.exe window open on my desktop. If I’m not running the DITA OT, I’m testing some Perl script, or Ant, or Python, or who knows.

A few years ago (in the Windows 98 days), I discovered a nifty cmd window trick. People are consistently amazed when I demonstrate it to them. Now I’m going to share it with you.

Say you need to change directory to some long and gnarly path name. You could type the whole thing in. Or, if you have Windows Explorer open on your desktop, you can:

  1. Type “cd ” in the cmd window (the space is important).
  2. Go to Windows Explorer and find the folder you want to navigate to.
  3. Drag and drop the folder from Windows Explorer to the cmd window.

Hey presto! The path name is copied to the cmd window. What’s more, if there are spaces in the path, the path is automatically quoted.

Now you can click in the cmd window and press Enter to perform the command.

Cool! No more typing long path names for this ToolSmith.

This works for filenames too. If I’m running a Perl script that needs to work on a file way down my directory tree, I type “perl myScriptName.pl “, then drag and drop the file name from Windows Explorer into my cmd window.

I’ll keep adding more ToolSmith’s Tricks as I use them. What’s your favorite trick?



DITA Open Toolkit: Not quite so scary

January 30th, 2009 by Simon Bate

Last October, I wrote about being scared.  Today, I want to talk about something that seems somewhat scary, but it isn’t…or shouldn’t be.  It may be daunting, but it’s not scary. That “thing” is the DITA Open Toolkit (DITA OT).

The DITA OT is the most common way to convert DITA content into output.  The output from the DITA OT is just about as ordinary as it gets.  People’s first reaction on seeing the output is: how can I make it look the way I want? 
Scriptorium’s new white paper Hacking the DITA OT explains many of the techniques you can use to change the appearance of the DITA OT HTML output.  These range from CSS changes to DITA specialization.    
When people approach these kind of changes, they often get lost in the structures and frameworks used by the DITA OT.  What they forget is that it’s perfectly acceptable to start out by making quick, proof-of-concept changes. This white paper shows you how to make these sorts of “hacks.”  The paper explains how you can go back and formalize your hacks into more elegant, framework-based changes. 
As I was writing the paper, I kept slipping up and saying “it’s easy to…” or “you simply…”.  My faithful editor corrected me, saying “it might be easy for you, but it might not be easy for your readers.”   He’s right.  But after reading this paper, you might find the DITA OT just a little less scary. 


DITAlini and Chickpeas

January 9th, 2009 by Simon Bate

There are times when my silly brain notices something and it just won’t let go. A few weeks ago I was leafing through a “foodie” magazine and I saw a reference to a pasta known as ditalini (“little thimbles”). Because I’ve spent the past year working with and teaching about DITA (the Darwin Information Typing Architecture), the letters “dita” in “ditalini” caught my eye. “Hmmm,” I thought, “Company potluck coming up in a couple of months…I can’t pass this one up.”

I was prepared for a major search to find a box, so I was quite surprised when I found a box of ditalini on the shelves of my regular grocery store. Once I had found it, the next question was: what to do with it? Perhaps a simple pasta and beans recipe. When cooked, each piece of ditalini is about the size of a chickpea (garbonzo, ceci), so that’s a natural pairing. I found a recipe and modified it a bit to my liking.

2 Tbsp olive oil
1-2 cloves garlic, minced
2 15 oz. cans chickpeas (NOT drained)
1 14 oz. can diced tomatoes, drained
1/2 tsp each thyme, rosemary, oregano, basil, marjoram
1 tsp salt
1/4 tsp pepper
1 cup ditalini (uncooked)

Heat the oil in a 3-4 quart saucepan. Add the garlic and allow to brown slightly.

Add the peas and their liquid, the tomatoes, pasta, salt and pepper, and ditalini. Bring to a boil, then let simmer for 15 minutes, stirring occasionally. Season to taste and serve.

Grate some Parmigiano-Reggiano over each serving.


I showed the box when I unveiled my dish at the potluck. It received all the appropriate groans.

Now I have to figure out what to do next year. Has anyone else found some foodstuff with a similar relationship to our profession or industry?



It’s a Right

November 4th, 2008 by Simon Bate

I’ll be brief. I’m a naturalized citizen of the United States. I became a citizen by choice. What was the difference between my remaining a resident alien and being a citizen? Simply the right to vote.

Every November (and sometimes in the spring during primary season), I have the privilege of participating in our election process. I vote with pride. I vote with respect. It’s my duty.

Whether you’re a naturalized citizen or were born in the US and have the innate right, exercise those rights. Vote today.



Looking Fear Straight in the Eye

November 3rd, 2008 by Simon Bate

Have you ever been really scared? I don’t mean just the Halloween kinda scared, but really scared. That’s how I felt at the Burlington Marriott when the hotel employee delivered the box containing the workbooks for my Introduction to XMetaL and DITA workshop. He stood in the doorway, smiled, and handed me a very beat up, bent, folded, spindled, and mutilated FedEx box.

The box looked like the driver had had a flat on Route 128 and used it to prevent the truck from rolling back while jacking up the front end. It was nice and damp too. With much trepidation, I opened the box and — to my relief — found that the materials were undamaged. Whew.

Following that, Wednesday’s all-day workshop on XMetaL and DITA was smooth sailing. OK, we had a bit of a problem with powerstrips, but the helpful DocTrain folks got that taken care of. In retrospect, many of the questions I fielded in the workshop weren’t so much about DITA or XMetaL itself. Instead many of the questions were about generating output. The fact is that unless you’re willing to spend some quality time with CSS and the DITA Open Toolkit, your output from DITA will look very generic. XMetaL has a number of hooks that ease some of the pain in generating XHTML output. But even those hooks won’t save you from FO issues if you want to generate PDF output.

In my presentation on Thursday comparing XMetaL and FrameMaker support in DITA, the questions returned once again to output. Of course, this time the focus was on using FrameMaker 8.0 as a PDF engine. In workflows where content is created and maintained in XML, but then has to be delivered in PDF (or print), FrameMaker 8.0 looks like an attractive possibility. There are a few flaws in this solution (such as translating xref elements for intra-document links into live links in PDF), but users are closer to a solution than they were six months ago.

We’ve posted PDFs of the slides from both sessions on SlideShare.

You can find the Introduction to XMetaL and DITA workshop slides at:

http://www.slideshare.net/Scriptorium/xmetal-dita-workshop-presentation

The slides for the session on DITA Support in FrameMaker and XMetaL are at:

http://www.slideshare.net/Scriptorium/dita-support-in-framemaker-and-xmetal-presentation

When you’re done browsing the slides, take a look on our site for information about how we can help you with your FrameMaker, XMetaL, OT, PDF problems.

It’s not that scary.



Hacking the DITA Open Toolkit

October 1st, 2008 by Simon Bate

(Scriptorium Publishing is a JustSystems Services Partner.)

My webinar, Key Elements on Customizing and Troubleshooting Output (or Hacking the DITA Open Toolkit) is now available for download. This event was jointly sponsored by Scriptorium Publishing and JustSystems. The recorded version is available here (registration required).

Between modifications one can perform with XMetaL Author Enterprise and those that can be performed in bare-metal (excuse me) OT, there was a lot of territory to cover, so the presentation went a little long (90 minutes or so).

Preparing for this presentation caused me to reflect on the work I used to do, modifying FrameMaker templates. In that work, I figured that 90 to 95 percent of the work was simple style replacement (update the decorations on the master pages, change the font in this paragraph style, add new character spacing to this character style, and so on). That was the easy stuff.

The remaining 5 to 10 percent of the work was the really hard stuff, often where the order of text items changed (building a challenging chapter opener, reimplementing admonitions, replacing cross-reference formats). These are the things that took the time and had me reaching for FrameScript and Advil (not always in that order).

Modifying output from the Open Toolkit is similar. The changes to CSS (or attribute sets), header and footers, basic page layouts, and so on are quite easy to do (whether you do them in XMetaL or directly). The place where you’ll spend much more of your time and effort is where the content affects the layout, where order of content matters, or where you have specialized content. This is the realm of XSL and specialization.

Just as with the FrameMaker template modifications, a DITA implementer will have perform these higher-order tasks at some time or another.

Are you looking at using the DITA Open Tookit for production? How much do you expect to have to change in the Open Toolkit output? How are you preparing for this work?



An incomplete puzzle: DITA OT stylesheets

September 5th, 2008 by Simon Bate

<span style="font-size:85%;">A recent post on the <a href="http://tech.groups.yahoo.com/group/dita-users/">dita-users</a> Yahoo group asked how to customize the DITA OT stylesheets in view of the fact that there isn't much documentation available.<br /><br />From my work customizing and otherwise perverting the DITA OT, I can sympathize with these frustrations. When I started investigating OT customizations, I found many well-crafted tutorials on how to customize and specialize the OT.  These were a great starting point, but they only got me so far.  In its current state, the documentation is an incomplete jigsaw puzzle; the trees and buildings are filled in nicely, but the sky is still waiting for someone with patience.  (Block that metaphor!)<br /><br /></span><span style="font-size:85%;">Because there is no documentation available at the individual template level, you need to reconsider the task at hand.  I look on it as debugging, decoding, or sleuthing. With that in mind, I find the following to be very useful:<br /></span>
  • Find a good visual grep-like utility. I use AgentRansack, a free version of FileLocator Pro (it’s free and amazing). This enables me to locate all files that contain a particular class identifier. The visual aspect of the tool allows me to see the context quickly.
  • Use a programmer’s editor that supports XML and XSL. We use Oxygen. Not only does it help check validity and closes tags automatically, but it also provides a handy sidebar that lists the templates and their modes.
  • Liberally spread <xsl:comment> or <xsl:message> directives through the stylesheets you’re examining. That helps figure out where you are. Use <xsl:value-of> or <xsl:copy-of> to figure out what you’ve got.
  • Once you’ve figured out what happens in one of the OT templates, add comments. Now the next time you come back to it, you won’t waste time.

<span style="font-size:85%;">Probably the best form of documentation that the OT could provide here is additional comments in the stylesheets, particularly about the order of processing.  I find I add many comments about where to find the template that handles nodes from an <xsl:apply-templates></xsl:apply-templates></span><xsl:message> &lt;</xsl:message><span style="font-size:85%;"><xsl:apply-templates>xsl:apply-templates&gt; directive.<br /><br />One further note.  On Tuesday, September 23, I'll be presenting the third of our "Best Practices in Structured Authoring and Publishing" joint Webinar series with JustSystems.  In this presentation I'll describe a number of approaches you can use to customize DITA OT output.  For more information, visit the <a href="http://na.justsystems.com/webinars.php">JustSystems</a> web site.<br /></xsl:apply-templates></span>



Building communities one IP address at a time

June 24th, 2008 by Simon Bate

Day 2 at Gilbane: Continuing in the Social Computing track

Case studies in collaborative computing

  • Frank D’Angese – EarthKnowledge.net
  • Mark Yolton – SAP Communities
  • Kym Harrington – Sales Edge

Social communities in Web 2.0:

  • Work best when they are orchestrated, rather than moderated.
  • Rating and ranking of contributers increases quality of the contributions.
  • SAP encourages quality by having users award 3, 6, or sometimes 12 points to contributers. When they reach 250 points, they’re considered “top contributors” or “highly-active contributors.”
  • The super-top contributors (1/100th of 1% of the best in points, professionalism, and maturity in collaboration) are identified as SAP Mentors.
  • Management can be concerned in adopting web 2.0 because of exposing weaknesses to competitors, general risk, and disruptive technologies.
  • SAP sponsors occasional get-togethers so that contributers can meet each other.

Some risks of sponsoring a user community:

  • Self-promoters and anti-social behavior. Both of these are often handled by the community. “Public humiliation is more effective than policing.” sez SAP.
  • If your company participates, beware of inauthentic interaction.
  • Sometimes users will vent about the product or company.
  • A “dead” community (no one goes there) will project a bad image.

Heard in several places. Support calls can be grouped into two classes: How-to questions and true bugs. The point of a user community is to reduce (or entirely offload) support time spent on how-to questions.

User communities follow the “1-9-90″ rule. That is 1% of the community are highly active participants, 9% are occasional contributers, and 90% are consumers of the information. SAP focuses on the 9% and encourages them to move into the top 1%.

English tends to be the “lingua franca” of user communities. Multilingual sites present a challenge, because they can be difficult to moderate.

While some communities offer live chat, it doesn’t have the value of public forums, because the information in a forum is captured permanently. Live chat is not as public and much harder to track and follow.

Social Media at Tipping Point

  • Geoffrey Bock – Gilbane Group
  • Rachel Happe – IDC

Dell seeks out conversations about Dell and participates in them. This has a major positive impact on brand sentiment.

HP Wetpaint — Advice was so good, people in printing got annoyed.
Worth visiting.

Use of LinkedIn varies by industry. The overall average is 14%. For low-tech companies, use is as low as 2%; high-tech it’s closer to 40%.
24% of all employees use some form of social networking.
Only 2% microblog (but I think that reflects much more on the novelty of the concept, rather than an acceptance level).

Gulp. 19% of the US workforce will retire in the next 5 years (this data is a couple of years old). There’s a lot of knowledge that will leave along with the people. How do we capture all the knowledge? The preceding figures show that not everyone will use social computing or communities to record what they do. One approach is to establish (and record) conversations with these people.

On the other hand, 41% of 18-35 year-olds use social media and expect it at work. Social media gives them more access to tools, people, etc. It’s NOT for just fun, they know it works faster.

Total content of the internet:
161 Exabytes (Millions of Terabytes) in 2006
900+ Exabytes by 2010
It’s expected that at that time 80% will be user-created.

Technology for Ad Hoc Information Sharing (open source)

  • Peter Monks – Alfresco (filling in for John Newton)
  • Dries Buytaert – Drupal
  • Michael Wechner Wyona.com

The enterprise sofware sales model is obsolete.
Open source eats $60B a year from traditional enterprise software. Open source is characterized as the “Ultimate Disruptive Technology”.

No cost of sales
Typically 7/10 of sales costs fund sales cycle
In Open Source 7/10 plowed into R&D.

MSFT thinks SharePoint will be next platform beyond Windows.

Humans are social animals. The don’t want processes. Need minitools to figure out what to do — particularly for non-tech doc. Technology gets in the way.

Check out Forbes Office Pranks (http://officepranks.forbes.com/).

Terms to pay attention to

  • Taxonomies – important to content management; critical to searching.
  • JSR-283 – An upcoming Java API for content retrieval.
  • RDF – Resource Descriptor Framework, a language for representing metadata. I considered (and passed on) using RDF in one of our projects, mostly because of what seemed like massive overhead. It’s probably worth a second, or third, look.

Toolsmith’s observations

There is no microblogging tool for the Enterprise (yet).

Many of the tools that make up Web 2.0 have been available for a while. What is different is that a) there is a critical mass of “next generation” tools and b) these tools have been embraced by web communities and the enterprise.

These include:

  • Developer communities
  • Groups/ forums
  • Blogs
  • Wikis
  • Ajax


Federating your Enterprise Content

June 18th, 2008 by Simon Bate

Summarizing an interesting day at Gilbane San Francisco 2008.

The Gilbane conference focuses on Enterprise and Web Content Management. Not necessarily something directly in Scriptorium’s business, but there are many, many tie-ins with what we do do. In particular, structured documentation and XML are golden to Enterprise Content Management Systems.

The keynote address was delivered by Udi Manber, VP of Engineering for Google. One of the more interesting points in his talk was the engineering process within his group. An engineer doesn’t ask permission to do anything. Instead, they experiment, evaluate the results of what they did, and then get approval based on the data.

This address was followed by a discussion between Dan Farber, Editor-in-Chief at CNET news, and Denis Brown, SVP of Business User Imagineering at SAP. Some points:

Denis described the “consumerization” of the workforce. That is, just as people access Amazon to order books or CDs, when the go to work, they expect to be able to use corporate intranet web sites to perform similar tasks. AND the sites need to work just a smoothly as Amazon.

One topic that has arisen again and again is the security issues presented by Web 2.0. This led me to wonder about IT protections on web traffic. In my experience, IT has often presented a big hurdle for technical documentation teams to make content available on externally facing corporate web sites. There are often reams of paperwork to be filled out…and even more if the pages might be updated more than once every 6 months. Web 2.0 means traffic and content will be flowing both ways. Oh boy.

In the next session I attended, Steven Arnold discussed aspects of his recent Gilbane report “Beyond Search”. Beyond his gruff (world-weary?) demeanor he had some good observations. Among them: Enterprise search doesn’t exist (because most enterprise docs are not available for indexing); 50-75% of users don’t think Enterprise search is working for them; there is no one-size-fits-all search, buy what works for your organization and data; most Fortune 500 companies have 5 to 10 separate search engines in house.

Ross Mayfield (http://www.socialtext.com/blog/) brought some fresh air to a large panel on Collaboration and Social Computing. All presenters had an enterprise focus; that’s what they do. But Ross’ discussion of “people as first-class objects” was really good to hear.