Reengineering the humble datasheet
This year is shaping up as the Year of the Many Datasheets. Several customers approached us with variations on this theme:
Our datasheets are current created in [FrameMaker | InDesign | structured FrameMaker]. We publish them to PDF and put them on our web site. We’d like to create HTML versions without maintaining a separate copy of the information, but right now [the process requires a lot of manual intervention | we’re not doing HTML output at all]. It would be nice if we could use information from the product management database instead of [retyping | copying and pasting].
Our current editing process is [redact horrifying description of tedious, overly elaborate, appallingly inefficient approach] because of [management | external factors | acquisitions | lack of training]+.
Localization is [a problem | a disaster | out of the question].
In all cases, the workflow was originally set up for print/PDF production. Most datasheet creators strive to keep page counts as low as possible (because these documents are or were being printed in large numbers). As a result, elaborate page layouts and formatting exceptions are, well, unexceptional.
In every case, some further investigation resulted in the discovery of a database, usually owned by product management, that was the theoretical source of the information in the datasheets. In no case was there any programmatic connection to the database—there was copying and pasting, there was export to Excel, and there was more than one case where the datasheet people did not even have access to the database.
In a perfect world, these datasheets should be completely automated. The specifications and other raw data could be exported from the product management database, and the narrative content (such as detailed product descriptions) would be added to the data.
In practice, only one of our customers was able to achieve that level of automation. Not coincidentally, this was the customer who a) had the most complete information in their datasheets and b) gave the lowest priority to PDF output. (They built a database and then used the information from the content files as input for the database.)
For the others, we had a few obstacles to complete automation:
- Inconsistent formatting for the datasheets. Copyfitting and page count reduction was (and is) given priority over formatting consistency.
- Incomplete databases. Although there was information in the product management database, it was incomplete and in some cases incorrect. The datasheets, which were maintained by hand, had more accurate information than the database.
- Unwillingness or inability to make drastic workflow changes. In some cases, the new workflow had to accommodate the skills and interests of the current staff. In other words, we could not introduce new tools and technologies because of this constraint. That, in turn, meant that our ability to automate the workflow was limited by what could be done within the current toolset.
With all that in mind, we did build some really interesting datasheet solutions, including the following:
- A scripted solution to convert unstructured FrameMaker to a format that could be imported into a database.
- A solution that updated content in a page layout application with data from a database.
You’ll notice that none of these solutions involve XML. All of them involve, however, involve making more intelligent content—the source files are more than just static representations of what ends up on the printed page. I expect that a move to XML will be phase two for some of these projects, as the importance of the PDF output diminishes.