Palimpsest
Monday, June 23, 2008
 
XPubs: XSL-FO for Documentation Formatting
Mike Miller, Antenna House

For starters, XSL-FO is an XML standard.

XSL-FO is "a pagination markup language describing a rendering vocabulary capturing the semantics of formatting information for paginated presentation." (Ken Holman)

Or, as I like to say, "A document layout described in a text file."

XSL-FO is black box formatting. Can't go back and "tweak" the files to fix them. With FO, you're typically talking about a minimum of a couple hundred pages. Much faster to render automatically rather than by hand in InDesign or FrameMaker.

First commercial products in 2001 from Antenna House and RenderX. Also, open source FOP from Apache in 2001. FO successful in the sense that both commercial companies are doing quite well.

FO more successful than any other technical publishing application other than perhaps TeX and FrameMaker. Probably attributable to the availability of open source (free) and trial versions from commercial vendors (free).

XSL-FO is only concerned with visual display of XML data, which means that the FO file has no semantic content, only formatting instructions.

The FO stylesheet specifies:
Advantages:
Antenna House has been personally involved in about 30 different DITA projects.

Most business documents can be formatted automatically as FO. Rule of thumb: "If it's XML, FO can be applied."

Other applications for FO might include faxes, German railway tickets, correspondence from financial institutions and government.

Typesetting is very complex with issues like widows and orphans and hyphenation. Software can handle this. Human typesetters have been removed from the process, and this shows in amateurish mistakes. But you can use FO to configure something that follows typography rules and give you a professional look and feel.

"Overwhelming benefits" of using FO. Which begs the question: "Why aren't more people using it?" A slide with the benefits of XML showing The Usual (cost, time-to-market, less redundancy, standards-based, localization for cost justification, etc.).

People who use FO: auto manufacturers, cell phone manufacturers, banks, aerospace, government, military, educational

FO not appropriate for documents that are "artistically created."

FO extensions provide support for:
Thus, if you need one of these features, you might get somewhat locked into your rendering engine...the extensions are specific to a particular FO engine.

DITA Open Toolkit reduces complexity of getting set up and produce PDF. Could be configured and producing PDF in "a couple of hours." (Perhaps, but making it look the way you want is going to take a while.) According to Mike, somewhere between a few days and a few months, depending on the complexity of your requirements.

PDF output from DITA
Stages:
Several software components are required -- DITA Open Toolkit provides all the components you need.

Why not FrameMaker or InDesign?
You need WYSIWYG if:
If you need WYSIWYG, you need a layout engine like FrameMaker or InDesign. If you need WYDSIWYN, you need XSL-FO.

On the low end, FO is free with FOP. Antenna House is most expensive at $1250 for stand-alone or server license for $5,000.

FO supports more languages than any other solution currently available.

Solving the real problem:
XSL-FO is delivering on the XML promise. Don't underestimate it.

First question: Flowing text into typesetting engine results in line breaks that will cause readers difficulty. And this annoys him (as a professional typesetter). We want powerful, automated formatting AND the ability to do WYSIWYG tweaks. Thinks there is a role for a WYSIWYG stage after the automation bit.

I've noticed this on the BBC, too. British people ask really pointed questions.

And in response, Mike says that Antenna House has a solution for this where you create INX (InDesign XML) content (4 minutes) and then you can pull it into InDesign (half an hour), and do some cleanup.

Do all the XSL-FO tools cover 100% of the FO standard? "No, definitely not."

Labels: , , , ,


Wednesday, March 19, 2008
 
WritersUA: DITA pilot techniques
Mark Wallis of IBM ISS on how to run a successful DITA pilot. Some great information in this presentation on how to reduce risks.

He recommends selecting your pilot project based on the following items:
They had one person out of a group of twelve, a "senior in name only" writer, leave because of this transition.

The ideal team for a pilot will need cross-functional and complementary skills:
Some advice on planning your content. (And it's worth noting here that these apply to good writing and topic-oriented content rather than to DITA tools.)
Some interesting discussion of "task support clusters," which include conceptual overviews, related tasks, deep concept, and reference information. (Michael Hughes did a presentation on this earlier today, which I unfortunately was not able to attend.)

They set up a DITA War Room in a small conference room and met at least daily (1.5 to 2 hours per day. Yikes). They set weekly goals and used small tasks to build momentum.

There was also heavy use of an internal wiki to put up initial "straw man" design, then revise, comment, and discuss.

Layering deliverables
Implementation deliverables were split out into smaller tasks, such as:
For the third time, he points out that they are no longer documenting how to use a check box, so I guess I'll mention it.

Choosing the DITA toolset

Task Modeler (free) for building and managing ditamaps, defining relationships between topics, and creating skeleton topics (stub files).

DITA-compliant editor to edit your topics.

Compiler (part of open source toolkit). Compiler? What are they compiling? HTML Help? Oh. He just referred to Ant as a compiler. Ohhhhhkay.

Proof of concept

They picked a subset of the pilot to do the proof of concept.

The presenter's boss is quoted as saying, "There's no such thing as bad weather, only insufficient clothing." I'm guessing that she's never been to Minnesota in winter.

The objectives for the proof of concept:
They learned that deliverable formats matter because they must deliver several different formats.

Managing costs

Purchase toolsets only for pilot team.

After completing proof of concept (successfully!), invest in tools for the remaining writers.

Wiki

They used their wiki to capture conventions and guidelines.

Improving acceptance

They paid attention to the change management issues. He doesn't mention it here, but I would assume that the combination of an acquisition by IBM plus the requirement to change the authoring environment could have caused significant angst. Their approach included presentations, wiki content, email discussions, and online training.

At the point of transition, DITA boot camp was offered.

They used collaborative walkthroughs, or reviews, to help standardize their content development. Interesting. This sounds as though it could be a) threatening and b) an unbelievable time sink. But just maybe it might also c) help improve the content.

Other lessons learned

Think more, write less. (Don't document the obvious, don't document common user interface convention, write only if you're really adding value.)

Don't squander your ignorance. (If something makes you stumble in the interface, that will probably also cause problems for your users, so capture it.)

The more structured your content, the easier the transition to DITA.

Documenting the obvious teaches readers to ignore your text, so don't document the obvious.

The handouts are available here: http://www.writersua.com/ohc/suppmatl/

Labels: , , ,


Thursday, January 03, 2008
 
"Once you start down the DITA path, forever will it dominate your destiny"
Eliot Kimber has a lovely article on using DITA for narrative documents. I'm trundling through it, nodding in agreement, and then we have this horror:
[...] DITA offers at least two compelling advantages over any other candidate XML application:
  1. The initial cost of ownership is low, approaching zero, and the ongoing cost of ownership is low.
  2. It offers a number of sophisticated features in terms of modularity, extensibility, and linking that either are not provided by other applications or would cost a prohibitively large amount to build from scratch.

That is, the cost of applying DITA is almost always going to be significantly lower than the cost of any alternative (and at worst will be no more expensive than any other alternative).

Now, he does qualify this statement by saying that these assertions apply only if DITA is a reasonable fit for your problem. But the overall thrust of the argument appears to be that since DITA can do narrative documents (which it was emphatically not designed for), it can potentially be applied to an enormous new set of content.

Ugh.

Before I begin today's DITA-bashing session, I need to point out that we are currently using DITA for several projects here at Scriptorium. DITA slices! DITA dices! DITA advocacy raises your IQ, improves your health, and makes you irresistible. I like DITA just fine.

Moving right along...

"1. The initial cost of ownership is low, approaching zero, and the ongoing cost of ownership is low."

Just because it's free doesn't mean it's cheap. The default output from the DITA Open Toolkit ranges somewhere between unattractive (HTML) and fugly (PDF). If you care about the appearance of your final documents, you are going to have to do a lot of work to get the look and feel you want. And although the OT offers a starting point, customizing it is kind of like a trip to the dentist. The impacted-wisdom-tooth-removing kind of trip.

Getting your output working properly is Not Easy because of the, er, unique design of the OT. If the set of tags you need is small, you might be better off building a nice petite NovelML and then writing the transformations you need for NovelML instead of wrestling with DITA's complexities.

"2. It offers a number of sophisticated features in terms of modularity, extensibility, and linking that either are not provided by other applications or would cost a prohibitively large amount to build from scratch."

I agree that DITA has some lovely features in this area. However, I fail to see how they apply to the example at hand -- a narrative document such as Moby Dick. If you need modularity, extensibility, and linking features, you should consider DITA. If you don't, then these features will just get in the way.
That is, the cost of applying DITA is almost always going to be significantly lower than the cost of any alternative (and at worst will be no more expensive than any other alternative).
If DITA is overkill for your requirements, then applying DITA may not be cheaper.

But the issue that upsets me the most is this: when you attack a problem by assuming (or hoping) that DITA will work, you necessarily look for DITA features you can use. You may not think carefully about non-DITA features that you might like to have. For fiction content, I can think of several things that would be quite useful (and for which DITA offers no immediate support):
Of course, you could pervert and/or specialize DITA to support these and other requirements. But if you start with a DITA-shaped box, how likely is it that you will think carefully about the possibilities outside the box?

As Eliot says, the advantages of DITA can be significant. But I fear that a generation of documents will be crammed into DITA, resulting in documents that are not as well structured as they need to be.

I will now await my smackdown from the DITA Disciples.

Signed,

DITA Dissident

Labels: , ,


Wednesday, January 02, 2008
 
2008 Predictions: They'll keep me humble in 2009
Each year, I write up an internal annual report, which discusses company performance in the previous year, looks at trends, and lays out a strategic plan for the following year. Generally, this report looks great in November and December and is completely obsolete by March (at the latest). Nonetheless, I thought I'd share some of the highlights from this year's analysis. I hope you will share your agreement or disagreement in the comments.

No clear leader for DITA
DITA authoring tools are everywhere. Long-time contenders (FrameMaker, Arbortext, and XMetaL [anyone remember SoftMetal or HotMetal??]) are adding DITA feature support. Many help authoring environments are adding DITA import or export. Several companies are developing web-based DITA authoring tools, and In.Vision Research has a DITA authoring plug-in for Microsoft Word.

The tools proliferation is disconcerting. In the olden days (the early 90s!), serious technical publishing was a fairly easy choice among FrameMaker, Interleaf, and maybe Ventura Publisher. Now, some tools are on the desktop, some are in the browser, some reside inside other tools, and life is much more complex.

Will things look different in five years? Certainly. I doubt, however, that we'll be back to half a dozen (or fewer) contenders. Instead, I think DITA output will become a check-off in the same way that HTML output is now.

Reuse analyzers
Both MadCap Software and Author-It have developed reuse analysis software -- Analyzer and XTend, respectively. Most of us are familiar with translation memory tools, which try to match new content to be translated against existing content in the TM database. The reuse analyzers do similar work, but in the source language. As you write, the software compares new content to existing content and recommends matches.

This is such an elegant, obvious idea that I can't believe it's new. But I haven't seen this type of tool in desktop-level software before.

Web 2.0 integration
User-generated content, such as blogs, wikis, and forums (not to mention YouTube), is on a collision course with "professional" content, such as user assistance and documentation created by technical writers. The complaints about the amateurs butting in where they don't belong must be painfully familiar to those who remember the rise of desktop publishing software and the destruction of the vast majority of the professional typesetting business.

Note: I laid out my first magazine in PageMaker. Version 1. What little manual paste-up I did was not very attractive.

Note to young people: The expression "cut and paste" is used because in the olden days, your parents used to use scissors ("cut") and glue ("paste") to move things around on a page layout. And "strippers" didn't always use poles. But I digress...

People who are paid to create technical content need to understand what user-generated content will and will not do. (Shameless plug: I'm doing a session on this topic at WritersUA in Portland, OR, this year.)

Global business
We have our fair share of customers in North America, but an increasing number of our clients are outside North America or have significant operations in multiple locations around the world. The implications for technical communicators are global audiences, global customers (internal and external), and a requirement to work well with people from all over the world.

This is an area where I believe that U.S. communicators face some significant challenges.

Flash
I expect Flash to become the next Next Big Thing. Flash technology enables the creation of interactive applications that run in a browser (or offline with AIR, which is also fascinating). Flash is widely used for games, but for our purposes, its role in e-learning applications is more important.

Traditional classroom training is effective (when you have a good trainer), but it's also expensive and it doesn't scale well -- the more people need training, the more costs rise. And furthermore, if the students are scattered literally all over the world, the costs of assembling them all in one location are astounding. I firmly believe that e-learning is less effective than a great classroom experience (of course, I'm biased since I am an instructor myself), but e-learning has some significant advantages -- like eliminating travel requirements and reducing overall cost.

Flash has almost nothing in common with the current Next Big Thing -- XML. XML is markup, text, human-readable, and geeky. Basic Flash is like Illustrator with an extra dimension (time). Advanced Flash is an application development environment.


So there you have my list of important developments for 2008. Do you agree? Disagree? Have additions?

Labels: , ,


Friday, May 04, 2007
 
Why XML and structured authoring is a tough transition
Found on technicalwriter's blog:
There are several applications that incorporate features for DITA use, such as XMetal and Altova Authentic, but how much value do they provide? (Looking over the online documentation for XMetal, you will see some pretty shaky formatting and copyfitting.)
There may well be formatting and copyfitting issues. Wouldn't surprise me at all. But talk about missing the forest for the trees!

DITA/XML/structured authoring are important because they improve how information is stored. To question their value because somebody produced documentation using them that doesn't look so great...let's try an analogy:
Last week, I went to a restaurant and the food was terrible. I looked in the kitchen and saw Calphalon pots and pans. I conclude that you should not buy Calphalon because the food they produce is terrible.
The quality of your food is determined by things such as the quality of the ingredients and the skill of the chef. The pan you choose does contribute -- it helps to use the right size and a high-quality pan, but to dismiss DITA because one example doesn't look quite right is pretty much like dismissing Calphalon because somebody once cooked something that didn't taste very good in it.

PS I like Calphalon. And I have produced my share of problematic entrees.
PPS DITA is not right for everybody.

Labels: ,