Palimpsest
Monday, June 23, 2008
 
XPubs: XSL-FO for Documentation Formatting
Mike Miller, Antenna House

For starters, XSL-FO is an XML standard.

XSL-FO is "a pagination markup language describing a rendering vocabulary capturing the semantics of formatting information for paginated presentation." (Ken Holman)

Or, as I like to say, "A document layout described in a text file."

XSL-FO is black box formatting. Can't go back and "tweak" the files to fix them. With FO, you're typically talking about a minimum of a couple hundred pages. Much faster to render automatically rather than by hand in InDesign or FrameMaker.

First commercial products in 2001 from Antenna House and RenderX. Also, open source FOP from Apache in 2001. FO successful in the sense that both commercial companies are doing quite well.

FO more successful than any other technical publishing application other than perhaps TeX and FrameMaker. Probably attributable to the availability of open source (free) and trial versions from commercial vendors (free).

XSL-FO is only concerned with visual display of XML data, which means that the FO file has no semantic content, only formatting instructions.

The FO stylesheet specifies:
Advantages:
Antenna House has been personally involved in about 30 different DITA projects.

Most business documents can be formatted automatically as FO. Rule of thumb: "If it's XML, FO can be applied."

Other applications for FO might include faxes, German railway tickets, correspondence from financial institutions and government.

Typesetting is very complex with issues like widows and orphans and hyphenation. Software can handle this. Human typesetters have been removed from the process, and this shows in amateurish mistakes. But you can use FO to configure something that follows typography rules and give you a professional look and feel.

"Overwhelming benefits" of using FO. Which begs the question: "Why aren't more people using it?" A slide with the benefits of XML showing The Usual (cost, time-to-market, less redundancy, standards-based, localization for cost justification, etc.).

People who use FO: auto manufacturers, cell phone manufacturers, banks, aerospace, government, military, educational

FO not appropriate for documents that are "artistically created."

FO extensions provide support for:
Thus, if you need one of these features, you might get somewhat locked into your rendering engine...the extensions are specific to a particular FO engine.

DITA Open Toolkit reduces complexity of getting set up and produce PDF. Could be configured and producing PDF in "a couple of hours." (Perhaps, but making it look the way you want is going to take a while.) According to Mike, somewhere between a few days and a few months, depending on the complexity of your requirements.

PDF output from DITA
Stages:
Several software components are required -- DITA Open Toolkit provides all the components you need.

Why not FrameMaker or InDesign?
You need WYSIWYG if:
If you need WYSIWYG, you need a layout engine like FrameMaker or InDesign. If you need WYDSIWYN, you need XSL-FO.

On the low end, FO is free with FOP. Antenna House is most expensive at $1250 for stand-alone or server license for $5,000.

FO supports more languages than any other solution currently available.

Solving the real problem:
XSL-FO is delivering on the XML promise. Don't underestimate it.

First question: Flowing text into typesetting engine results in line breaks that will cause readers difficulty. And this annoys him (as a professional typesetter). We want powerful, automated formatting AND the ability to do WYSIWYG tweaks. Thinks there is a role for a WYSIWYG stage after the automation bit.

I've noticed this on the BBC, too. British people ask really pointed questions.

And in response, Mike says that Antenna House has a solution for this where you create INX (InDesign XML) content (4 minutes) and then you can pull it into InDesign (half an hour), and do some cleanup.

Do all the XSL-FO tools cover 100% of the FO standard? "No, definitely not."

Labels: , , , ,


Wednesday, March 19, 2008
 
WritersUA: Day 3, Morning
Dave Gash (hypertrain.com) leads off the festivities with a discussion of the UA Holy Grail. And no, it's not DITA.

He is discussing True Separation of Content, Structure, Format, and Behavior.

Interesting, because we normally hear about separation of content and presentation -- he's making finer distinctions.

According to Dave, the current authoring method is to using WYSIWYG and code editors, often in combination. And as we work, we insert what's needed wherever it's needed. The result is that documents work -- once -- but are very difficult or impossible to update, maintain, and control.

Spaghetti-code documents make our own jobs harder.

The conventional wisdom is to separate content and formatting. Content is "stuff on the page"; therefore format must be "everything that is not content."

Content could include HTML, CSS, and JavaScript. Separating out CSS still leaves "junk" in the content pages.

Dave proposes a more refined model: content, structure, formatting, and behavior.

* Content is XML
* Structure is XSLT
* Format is CSS
* Behavior is JavaScript (JS)

This will be more maintainable, which means:

* Ability to change any components without breaking the others
* Ability to reuse any component in other pages or projects
* Ability to control each component's resource allocation (that is, who creates each piece?)

How to improve your pages:

1. Identify and externalize JS behavior.

* Find the embedded scripts (<script> tags) and remove them with a reference to an external foo.js file.

<script language="javascript" src="foo.js"></script>

2. Identify JS behavior that could be CSS and convert it to CSS rules.

"If you can encode with CSS and make it declarative instead of procedural, you're way ahead of the game."

* Catch "sneaky" JavaScript behavior, such as mouseover events, that could be CSS rather than JavaScript. Event handlers that call JavaScript almost always start with "on" -- easy to identify and many can be replaced with CSS hover pseudoclasses.

.expterm:hover {font-style:italic; }
.expterm {text-decoration:none;}

Removing the code from the HTML greatly simplifies the page.

3. Identify and externalize CSS styles, recode any local formatting as classes.

Get rid of "deprecated tags and doo-doo like that."

Get rid of style attributes, font tags, b tags (become span tags).

"It's said that comments are for someone who comes behind you six months later and needs to update your code. This is not true. Comments are so that YOU can figure out six months later what you were doing in the code."

So you should comment your code.

4. Semantically mark up content as XML.

Dave's definition of semantic markup? "call things what they are."

5. Identify desired HTML output structure, write XSL transforms to produce it.

So...what's in it for me?

Discrete, maintainable, controllable components
* you can change one component without breaking others
* You can share components with other pages
* You can separate work load by skill sets
* Set it and forget it! (for everything except the content)

Code examples are available at Dave's web site: www.hypertrain.com

Questions about tools. No, he won't recommend tools. Question about schemas...Dave says the first thing that comes to mind is...DocBook???

Yikes. In an answer to a question about print and XSL-FO, somebody recommended asking....me! (I swear I didn't pay her for that, and I don't think she even knew I was in the room. Quite surreal.)

##

My only disagreement with this session is with the separation of XML as "content" and XSLT as "structure." It's my opinion that the XML includes the structure, and XSLT just gives me a way to express that structure into HTML or other formats.

I also question some of his tag names, such as <expander> for a term/definition group. The expander tag name is really a description of the desired behavior (expandable text) rather than the semantic function of the content (definition of a term). I would probably choose something like <glossaryitem> for the container, leaving opening the option of changing the behavior to something other than expansion in the future. Same quibble with <ddblock> (drop-down block).

I do like the use of the tag name for step results.

Great presentation from an energetic presenter whose motto is, "If I have to be awake, you do, too!"

Side note: I'm pretty sure that if you tied Dave's hands behind his back, he would lose his ability to speak.

Labels: , , ,


Tuesday, May 01, 2007
 
Writing better XSL
Jeni Tennison has a new blog. Her latest post has tips on when to use template matching, named templates, and for-each statements.

In my experience, most people who are new to XSL overuse for-each loops, because they most closely resemble familiar programming constructs.

Labels: