Read in PDF
(205 KB, 15 pages)
Out of the box, the DITA Open Toolkit (OT) looks like it’s localization-ready. It handles the XML attribute xml:lang. It contains strings for more than 50 localizations. So it would seem that all you have to do is specify the language in your DITA files and maps and you’re good to go…or are you?
In this white paper, I’ll discuss some of issues Scriptorium has encountered while generating localized output from the DITA Open Toolkit—and how we solved them.
The first part of this paper addresses general problems in localization and the DITA Open Toolkit. The second part discusses two plugins that present additional challenges:
- The DITA Open Toolkit PDF plugin, because it behaves differently than the other DITA Open Toolkit transformation types.
- The Microsoft HTML Help plugin, because the aging Microsoft HTML Help compiler is very sensitive about file encodings.
This white paper covers technical implementation details in the DITA Open Toolkit. You should be familiar with Apache Ant and should have a reading knowledge of XSLT.
How it’s s’posed to work
Localization in the DITA Open Toolkit begins with the XML attribute xml:lang. This attribute specifies the language and country of the contents of a particular element. Although you can use the attribute on any elements in a DITA topic, you typically apply it to the root topic element. For instance, this xml:lang attribute indicates that the concept topic contains Canadian French content:
<concept id="c_202934802" xml:lang="fr-ca">
When a DITA topic is localized, translators should modify the xml:lang attribute.
Each localized topic and map in a localized information set must specify the language and country for its content. Note that changing the xml:lang attribute in the top-level DITA map or bookmap is not sufficient.
All topics and maps must indicate their language and locale because of the way the DITA Open Toolkit generates HTML output. When DITA topics are transformed to HTML, each topic is processed separately without regard to the DITA map that contains it.
It’s all about strings
In transforming DITA topics to an output format, there are times when your transform inserts text that does not originate in the source content into the output stream. In English, chapters often begin with the word “Chapter”; admonitions need the text “Note,” “Warning,” or whatever is appropriate; and page headers and footers might include some boilerplate text.
These strings are defined in the transforms that create output. The transformation stylesheets usually contain multiple sets of strings, one for each language that the transformation supports.
<!-- Nav bar buttons -->
<str name="contents_button">Contents</str>
<str name="index_button">Index</str>
<str name="search_button">Search</str>
<str name="print_button">Print</str>
It’s good programming practice to store commonly used strings in a single location rather than hard-code each string every time it’s used. This eliminates duplication and makes maintenance and translation much easier. Additionally, when a different language is required, a file (or set of files) with strings for the target language can be easily substituted for the initial set of strings.
The DITA Open Toolkit uses this principle. Strings are organized in sets of parallel files; each set contains one file per language.
The DITA Open Toolkit and getString
In addition to the strings files, the DITA Open Toolkit contains XSL templates that support string substitution and localization. When a DITA Open Toolkit transform processes a topic that requires a standard string (for example “Note” in a
<xsl:call-template name="getString">
<xsl:with-param name="stringName" select="'Note'"/>
</xsl:call-template>
The getString template uses the xml:lang attribute in the current topic to find the language-specific string that matches the stringName. If the locale in xml:lang is not recognized, the locale specified by the parameter DEFAULTLANG is used; if DEFAULTLANG is not specified, the English string is used.
The getString template is defined in $DITA_HOME/xsl/common/dita-utilities.xsl. The strings themselves are stored in xsl/common/strings_lang_loc.xml.
How the strings are organized
The DITA Open Toolkit plugin architecture enables you to localize your plugins by adding your own string definitions for use by getString. First, let’s take a closer look at the getString mechanism.
The root of the getString mechanism is the file $DITA_HOME/xsl/common/allstrings.xml. Out of the box, this file contains a single
The file $DITA_HOME/xsl/common/strings.xml contains one or more
<lang xml:lang="ar" filename="strings-ar-eg.xml"/>
<lang xml:lang="ar-eg" filename="strings-ar-eg.xml"/>
The filename attribute specifies the language string file that contains the strings for that language and country. Each of the language string files contains <str>
elements. Each <str>
element defines the language-specific string for a particular text string. For instance (from the German strings-de-de.xml):
<str name="Related tasks">Zugehörige Tasks</str>
Note that the name of the string is always in English. Each language string file should define a parallel set of <str>
elements (each with the same name attributes, but with different, language-specific text). So the Spanish strings-es-es.xml contains:
<str name="Related tasks">Tareas relacionadas</str>
and the U.S. English strings-en-us.xml contains:
<str name="Related tasks">Related tasks</str>
Adding strings to your plugin
In your plugin, you can define new strings, override the existing strings, or add additional files for languages that the Open Toolkit doesn’t support out of the box. One of the features of the plugin architecture is that you do not have to copy all the strings; you just add what is new or changed.
Start by copying $DITA_HOME/xsl/common/strings.xml to xsl/common/my-strings.xml in your plugin. Edit the file so that it indicates only the string files that you are modifying or need to add. Make sure you use the standard ISO 2-letter codes for language (ISO 639-1) and country (ISO 3166). The official site for the language codes is http://www.loc.gov/standards/iso639-2/langhome.html; the country codes are listed at http://ftp.ics.uci.edu/pub/ietf/http/related/iso3166.txt.
In your xsl/common folder, copy the language string files you plan to modify from the $DITA_HOME/xsl/common folder in the default DITA Open Toolkit. If you need to add languages, create new language files by making a copy of $DITA_HOME/xsl/common/strings_en_us.xml and renaming it to match the new language.
Finally, modify your plugin.xml file so that it incorporates your my-strings.xml into the allstrings.xml file. Add this line to your plugin.xml file:
<feature extension="dita.xsl.strings" value="my-strings.xml" type="file"/>
Limits of single-string replacement
The getString template is fine for inserting a single word (such as “Note”). In English this works equally well for adding the word “Chapter” to chapter heads. However, not all languages are constructed in the same way as English. Some languages position strings and numbers differently than English. Other languages may use multiple words or characters where English uses a single word.
In English, we write “Figure 9.” But in Hungarian, this is written “9. ábra.” Thus, a transform must make a language-specific distinction when generating Hungarian.
In Japanese, a chapter title consists of the Kanji dai, the chapter number, and then the Kanji shou. Because getString replaces a single string at a time, this two-part insertion cannot be done in a single call to getString. Instead, the transformation must define two separate strings and then insert them in order around the number. To do this, the transformation must also include special-case code that detects that Japanese is the target language.
There are a number of similar problems of this kind (for example, a string might contain two values that appear in different order in various languages). When you encounter these types of problems, it’s tempting to add language-specific code to your XSL, but that leads to code that’s hard to maintain.
For this reason, the PDF plugin implements a different string substitution mechanism. (The PDF implementation is described later in this paper.)
String substitution beyond XSL
In a given transformation type, most deliverables are created by transforming DITA sources with XSL. However, there are times when the deliverables include additional files that may contain language-specific strings. In particular, we have encountered:
- HTML files that contain buttons, labels, and other language-specific information.
- JavaScript files that modify the appearance of web pages, based on user interaction.
The files contained text strings that were language specific, but we didn’t want to create these files using XSL transforms, mostly because it would make them difficult to maintain. Instead, we used Ant to perform the string substitution in file templates.
There are four steps to this solution. Because Ant projects vary widely in implementation, it’s not possible to show exact implementation details; these steps sketch the general ideas behind what you need to do:
-
Create an XSL stylesheet that reads strings from the strings.xml files, based on the xml:lang attribute of the DITA map. The stylesheet stores the strings and their name attributes in an XML properties file (see http://java.sun.com/dtd/properties.dtd or http://download.oracle.com/javase/1,5.0/docs/api/java/util/Properties.html). Thus the contents of this English strings file:
<strings xml:lang="en-us"> <!-- US English -->
...
<str name="expand_string">Click to expand</str>
...
</strings>
is transformed into this properties file:
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
...
<entry key="expand_string">Click to expand</entry>
...
</properties>
Or the contents of this French strings file:
<strings xml:lang="fr-ca"> <!-- Canadian French -->
...
<str name="expand_string">Cliquer pour dérouler</str>
...
</strings>
is transformed into this properties file:
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
...
<entry key="expand_string">Cliquer pour dérouler</entry>
...
</properties> - Modify your Ant project so that immediately after running the string conversion XSL, it calls the task xmlpropertyreader. This task reads a properties file and creates Ant properties based on the file’s contents. This Ant task is specific to the DITA Open Toolkit and is defined in build_init.xml.Make sure the properties file is created and the xmlpropertyreader task is called in your Ant project before the project calls targets that require these properties.
- Create templates for the HTML and JavaScript files. When string substitution is necessary, use the string names defined in your strings files, but encode them as Ant property references (${myProperty}). For instance, this JavaScript string assignment statement defines an error message indicating that quotes must be balanced:
loc_strings.srch_badquote = '${balance_quotes}'; - Use the Ant
task to replace the Ant properties (such as ${balance_quotes}) with the language-specific strings. Use to copy your templates files from their source folder (usually in resources) to the output folder. In the copy task, add a element that contains a nested element. The element replaces all Ant property references in the copied files with the appropriate strings.
<copy todir="${output.dir}" overwrite="true" filtering="yes">
<fileset dir="${sps_tripane.plugin.dir}${file.separator}resource${file.separator}">
<include name="**/*.js"/>
<include name="nav_bar.html"/>
</fileset>
<filterchain>
<expandproperties/>
</filterchain>
</copy>
Sorting indexes
Index generation is a crucial aspect of localization because indexes must be sorted appropriately for the current language. Out-of-the-box, only three DITA Open Toolkit transformation types provide index generation (JavaHelp, Eclipse Help, and HTML Help); the others do not.
For these three transformation types, extracting, sorting, and generating indexes is not implemented in XSL. Instead, these steps are performed by Java routines in lib/dost.jar. These routines use the language-specific sorting sequences that are a part of Java.
If you are creating a plugin and need an index, you have three alternatives:
- Download and modify the sources for dost.jar. Add support for your output format, recompile, and include the new dost.jar in your plugin.
- Use one of the existing index types available from dost.jar, then modify the output (perhaps using XSL) to suit your needs.
- Create index-building routines in another programming language, such as XSLT. However, if you do this, you need to ensure that any sorting you do supports all your target languages and countries.
The PDF plugin
The PDF plugin presents both a challenge and an opportunity in localization. The challenge is that the PDF plugin does not support index generation out of the box; the opportunity is that its string substitution templates are much more flexible than the simple string substitution provided by the base-line DITA Open Toolkit.
The PDF plugin contains a Customization folder where you should store files that modify the behavior of the PDF plugin. It’s a good idea to use the Customization folder for two reasons:
- It allows you to keep your modifications in a small, portable package. If you need to pass your modifications to another user of the DITA Open Toolkit, you can give them the Customization folder rather than the entire PDF plugin.
- When a new version of the DITA Open Toolkit (or the PDF plugin) is released, you can copy your Customization folder to the new version of the PDF plugin. (However, it’s always a good idea to compare the old and new versions of the plugin, to ensure there aren’t any changes that will affect your modifications.)
String substitution in the PDF plugin
Idiom Technologies, the creators of the default PDF plugin in the DITA Open Toolkit, was aware of many issues associated with internationalization and localization. In creating the PDF plugin, Idiom realized that returning a single string was not flexible enough for some situations. Instead, Idiom completely ignored the getString template (and all the strings files) and implemented a much more sophisticated series of templates, all of which are invoked through the template insertVariable.
The XSL templates for the PDF plugin are defined in $DITA_HOME/demo/fo/xsl/common/vars.xsl. The language- and country-specific strings are stored in $DITA_HOME/demo/fo/cfg/common/vars/lang_LOC.xml. Unfortunately, the folder contains localizations for only seven languages (English, French, German, Italian, Japanese, Spanish, and simplified Chinese). If you need to create PDF output for other languages, your translators will have to create language-specific versions of the variables files. It might also be necessary to create files that define the sort sequence for your language or identify some special characters used in the new languages.
When the PDF plugin requires a simple string, it calls the template insertVariable, with the parameter theVariableID that identifies the variable (the string) to be inserted.
<xsl:call-template name="insertVariable">
<xsl:with-param name="theVariableID” select="'Note'"/>
</xsl:call-template>
The insertVariable template is not limited to returning a simple string.
The string referenced by theVariableID can contain XML markup (<param> elements) that indicate where parameter values can be inserted. The parameter theParameters contains a series of parameter values that are inserted in place of the <param> elements. Thus, the string returned by insertVariable includes the substituted parameter values.
For example, here’s the variable definition for a Figure title:
<variable id="Figure">Figure <param ref-name="number"/>: <param ref-name="title"/></variable>
To use this variable along with the number and title parameters, the call to insertVariable looks like this:
<xsl:call-template name="insertVariable">
<xsl:with-param name="theVariableID” select="'Figure'"/>
<xsl:with-param name="theParameters">
<number>
<!-- XSL to determine the figure number. -->
</number>
<title>
<!-- XSL to retrieve the figure title. -->
</title>
</xsl:with-param>
</xsl:call-template>
If the target language calls for the number and title in a different order, the translator can change the position of the parameters in the translated strings file.
Adding your own definitions
Adding your own definitions to the PDF strings takes fewer steps than the DITA Open Toolkit getStrings method, but it does not use the plugins architecture. Instead, you use the Customization folder in the PDF plugin.
To add variables for your own customization, create new language-specific files in Customization/common/vars that parallel the files in cfg/common/vars. The files should have the same naming convention (lang_LOC.xml), and the content of the files should use the same syntax:
<vars xmlns="http://www.idiominc.com/opentopic/vars">
<variable id="id-value">string</variable>
<variable id="id-value">string</variable>
...
</vars>
PDF indexes
When the PDF plugin processes a map, it merges all the individual DITA topic files into a single XML file named stage1.xml. As part of merging the files, the plugin gathers all the
The plugin performs this action for book maps and DITA maps; it doesn’t matter whether the
The entries are sorted by the com.idiominc.ws.opentopic.fo.index2.IndexPreprocessorTask Java class, which gets its sort order from the sort-order files in cfg/common/index. The files in this folder are named like the files in cfg/common/vars, with the form: lang_LOC.xml. Each file defines the character sorting order for one language and country.
If you are adding a new language or locale to the PDF plugin, create a new sort-order file. Start with an existing sort-order file that is close to your target language and country and copy that file to Customization/common/index.
Special characters and fonts
The PDF plugin includes another series of localization files in the folder cfg/fo/i18n. Some fonts support more Unicode code points (character values) than other fonts. This is particularly true when it comes to special characters, such as trademark, registered trademark, and other characters. The PDF plugin allows you to identify special characters that the transform might encounter in DITA topics and associate those characters with fonts that implement them.
The folder cfg/fo/i18n contains a series of language-specific files that identify these special characters. Each character is assigned to a particular character set. A preprocessing step in the PDF plugin (using the com.idiominc.ws.opentopic.fo.i18n.PreprocessorTask Java class) locates these characters in the DITA source files and ensures that they are assigned the correct font in the final XSL-FO output.
Unfortunately, there’s not a lot of documentation in the PDF plugin about the names of the character sets and how to use these files. (The best information we have found on this topic is a posting by Deborah Pickett on the
dita-users Yahoo group (http://tech.groups.yahoo.com/group/dita-users/message/11110). You can find (and change) the names of the character sets by examining the char-set attributes for the
In general, you assign characters to character sets when creating a new localization for the PDF plugin, rather than customizing an existing localization. However, if you need to add characters to a character set:
- Make a copy of demo/fo/Customization/catalog.xml.orig and name it catalog.xml.
- Edit the new catalog.xml and uncomment one or more of the lines in the section identified with <!– I18N configuration override entries. –>.
<uri name="cfg:fo/i18n/en_US.xml" uri="fo/i18n/en_US.xml"/>
- For each of the lines that you uncomment, create a file in Customization/fo/i18n.
- Edit each new file; add or change character values in the file as necessary. Use the syntax of the files in cfg/fo/i18n as a model.
Microsoft HTML Help
Some of our biggest challenges in localizing DITA Open Toolkit output have been in generating Microsoft HTML Help, particularly for non-Western languages. There are a number of problems inherent in Microsoft HTML Help:
- It doesn’t deal consistently with Unicode files.
- The files used by the HTML Help (HHP, HHC, HHK) must use language-dependent encodings.
- The HTML Help compiler fails if the HHC file contains a Byte Order Mark (BOM).
- For some languages (such as Russian), it’s important to specify a font in the index and table of contents.
- To correctly generate help (with search and index) for a given locale, you must run the HTML Help compiler on a computer that is configured for the target language.
Despite these limitations, it is possible to use the DITA Open Toolkit to generate localized HTML Help output from localized DITA sources.
About file encodings
When working on localizations of HTML Help, you’ll spend a lot of time making sure the files are encoded correctly and that they contain the right encoding information. Thus, it’s important to understand the various forms of text file encoding.
There are many different ways of representing (or encoding) characters in a text file. In English and Western European languages, a text file might use ASCII, ISO-8859-1, or Unicode. However, other languages use characters that do not appear in the ASCII or ISO-8859 character sets. Historically, text files in these languages have used variations of ISO-8859 or language-specific encodings (such as Shift_JIS for Japanese or BIG5 for Chinese). More recently, text files in these languages use Unicode.
This multiplicity of text encoding schemes creates problems when a program needs to open a file and process the text within it. Whether the file uses ASCII, ISO-8859, or Unicode, it’s important to know which text encoding scheme was used. Unfortunately, most file systems do not track information about the file encoding. Instead, there are a limited number of ways a program can determine the encoding scheme used in a particular text file:
-
The file can begin with information about how the rest of the data is encoded. For example, HTML files use the tag:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1254">
XML files use the XML declaration:
<?xml version="1.1" encoding="UTF-8"?>
-
A Unicode file will usually (but not always) begin with a two-byte Byte Order Mark (or BOM) that has the value 0xFEFF. The BOM serves two purposes:
- It indicates the order of bytes (little-endian or big-endian) in UTF-16 and UTF-32 files.
- It helps programs detect Unicode-encoded files.
The only problem with the BOM is that older programs (such as the Microsoft HTML Help compiler) treat the BOM as a “normal” document character. This can have unfortunate consequences, such as causing the compiler to crash.
- Some file systems or programs can scan the contents of a file, looking for particular indicator characters that might indicate a specific encoding.
The HTML Help project files
There are different types of files used to build HTML Help. Each file type has its own peculiarities.
| File purpose (extension) | Comments |
|---|---|
| HTML Help project file (.hhp) | Must include Microsoft Language ID. |
| Table of Contents (.hhc) | File cannot include a BOM.
Must specify encoding in tag. Specify font for non-Western languages (optional). |
| Index (.hhk) | Specify font for non-Western languages (optional). |
| Help content (.html) | Must specify encoding in tag. |
| List Help IDs (.h) | Plain text file. |
| Help aliases (any) | Plain text file. |
Language encodings used by HTML Help
When working on projects with multiple locales, we have found it useful to build a spreadsheet or table in which to track locales, encodings, Microsoft country codes (used in the HHP file), and required fonts.
This table shows an example of some encoding schemes we have found work for various languages. This table is by no means complete, but it does show the work involved generating HTML Help output for many languages.
The table contains separate columns for “HTML Encoding” (that is, the content HTML files), “HHW Encoding” (that is .hhp and .hhk files), and “HHC Encoding” (the .hhc file). Some of our clients have reported achieving better results when using different encoding schemes for these three file types; experiment with the encoding types until you get the results you need.
| Locale – ISO code | Microsoft Language ID | HTML Encoding | HHW Encoding | HHC Encoding |
|---|---|---|---|---|
| English – en-us | 0×0409 | iso-8859-1 | iso-8859-1 | iso-8859-1 |
| Spanish – es-es | 0x0COA | iso-8859-1 | iso-8859-1 | iso-8859-1 |
| French – fr-fr | 0x040C | iso-8859-1 | iso-8859-1 | iso-8859-1 |
| German – de-de | 0×0407 | iso-8859-1 | iso-8859-1 | iso-8859-1 |
| Italian – it-it | 0×0410 | iso-8859-1 | iso-8859-1 | iso-8859-1 |
| Chinese (simplified) – zh-ch | 0×0804 | UTF-8* | UTF-8* | UTF-8* |
| Russian – ru-ru | 0×0419 | UTF-8* | UTF-8* | UTF-8* |
| Japanese – ja-jp | 0×0411 | shift_jis | shift_jis | shift_jis |
| Turkish – tr-tr | 0x041F | windows-1254 | windows-1254 | windows-1254 |
* The HTML Help compiler handles UTF-8 correctly for these languages only when running on Windows configured with language support for these languages.
In our modified HTML Help plugin, we store the various encoding types for each locale in the language-specific strings files:
<str name="Language Code ID">0x409 English (United States)</str>
<str name="html.encoding.type">ISO-8859-1</str>
<str name="hhw.encoding.type">ISO-8859-1</str> <str name="hhc.encoding.type">ISO-8859-1</str>
In our Ant project file, we convert the contents of the string file to Ant properties so that later in Ant processing we can use the properties when creating files or changing the file encoding.
Modifying file encoding
Most XSL transformation engines allow you to specify the encoding of the output file, but we have found that sometimes leads to complications. The safest strategy is to generate all files using the UTF-8 encoding. That way all files start in a consistent state.
Once all the files are generated, use Ant to change the encoding information and to remove the BOM (if necessary).
Start by converting the strings in strings.xml to a properties file, and then convert the properties file to Ant properties.
The key to modifying the file encoding is to use the Ant
This example expects that all HTML files contain the string content=”text/html; charset=UTF-8″ and replaces that string with a new string specifying the correct encoding. The example also illustrates one of the benefits of starting with UTF-8 as the standard encoding:
<!-- Fix metas in *.html, then copy to change encoding. -->
<echo>Fixing *.html. Changing charset=UTF-8 to charset=${html.encoding.type}</echo>
<replace dir="${output.dir}" encoding="UTF-8" oken='content="text/html;
charset=UTF-8"' value='content="text/html; charset=${html.encoding.type}"'
summary="yes">
<include name="**/*.html"/>
</replace>
You’ll have to perform this action on several different file types (HTML, HHP, HHK, HHC, and so on), but you may have to use a different encoding type for each file type.
Removing the BOM
Removing the BOM requires two
These steps presume you have already converted strings into Ant properties, so the property toc.encoding.type is ready to go.
Start by copying the HHC file to a temporary location:
<!-- Remove the BOM from UTF-8 files -->
<copy todir="${output.temp}"
overwrite="true"
failonerror="false">
<fileset dir="${output.dir}">
<include name="**/*.hhc"/>
</fileset>
</copy>
Remove the BOM by copying the HHC file back to its original location, specifying both the input file encoding (encoding=”UTF-8″) and the output file encoding (outputencoding=”${toc.encoding.type}”). If the outputencoding attribute is anything other than UTF-8, the Ant
<copy todir="${output.dir}"
overwrite="true"
failonerror="false"
encoding="UTF-8"
outputencoding="${toc.encoding.type}">
<fileset dir="${output.temp}">
<include name="**/*.hhc"/>
</fileset>
</copy>
Specifying fonts for table of contents and index files
For some languages, it’s necessary to specify the font that the Microsoft HTML Help viewer uses when displaying table of contents and index files. This is because the table of contents or index may contain a character that is only available in a limited set of fonts. By specifying the font, you ensure that the characters in the language are available to the table of contents and index in the help.
To specify a font, add an
<OBJECT>tag to (or modify the first
<OBJECT>tag in) the .hhc or .hhk file. The
<OBJECT>tag must be the first tag within the tag. The
<OBJECT>tag contains a tag. The name attribute specifies the font (see Microsoft HTML Help workshop documentation for further information about specifying the font).
<BODY><OBJECT type="text/site properties">
<param name="Font" value="font-specification">
</OBJECT> ...
Summary
The DITA Open Toolkit has a rich set of support for localization.
The getString mechanism is useful and extensible. Out of the box, the Open Toolkit has strings files for more than 50 languages and locales (for HTML). The strings files can also be converted to Ant properties.
Sorting indexes is handled in Java and can be further modified or supplemented in a number of ways.
The PDF plugin created by Idiom further extends the string replacement mechanisms by allowing for multiple, position-independent string replacement. However the PDF plugin only supports seven languages (as of DITA Open Toolkit 1.5.3).
Creating HTML Help for localizations is a challenge, because the technology is aging. But with the right care, it can be done.
Contact us
Scriptorium Publishing Services helps organizations with large amounts of technical content streamline their publishing processes, which results in cost reduction and quality improvements.
We can advise you on content strategy, provide technical communication services, convert documents, customize the DITA Open Toolkit, and provide training.
If you need help in any of these areas, contact Scriptorium at info@scriptorium.com, or call 919-481-2701. You can find us on the web at www.scriptorium.com.



The document says: To correctly generate help (with search and index) for a given locale, you must run the HTML Help compiler on a computer that is configured for the target language.
A couple of years ago, we found a way to build our Japanese and Simplified Chinese HTML Help on our English build machine. There is a Microsoft utility called apploc.exe. See:
http://www.microsoft.com/download/en/details.aspx?id=13209.
Our build includes a step like this:
“%appdir%\apploc.exe” “%SystemRoot%\system32\cmd.exe” “/c
%batchdir%\tt.bat” /L0411
apploc.exe launches tt.bat in the Japanese locale. The batch file contains the hhc.exe command, fully qualified, and all its parameters. (Use /L0804 for the Simplified Chinese locale.) We get localized ToC, index, search, and content. The window title is localized on a localized machine. On an English machine, the content is localized and displays properly. The only difference is that the window title is always “HTML Help”, which seems to be a “feature” of hh.exe.
We specify the full path for every file name. The /c is required because we’re using cmd.exe to launch a batch file and that requires the DOS command interpreter.
apploc creates a new window, which disappears when the command is done. Our batch file records the command output to a file.
We have not found any issues with this approach and have used it for several years.
Steve McDowell
Director of Engineering
SQL Anywhere
Waterloo, Ontario, Canada
Thanks for the tip, Steve! I’ve been reading about about apploc.exe, but have seen mixed comments about it. This inspires me to give it a try.