Palimpsest
Monday, June 23, 2008
XPubs: XSL-FO for Documentation Formatting
Mike Miller, Antenna House
For starters, XSL-FO is an XML standard.
XSL-FO is "a pagination markup language describing a rendering vocabulary capturing the semantics of formatting information for paginated presentation." (Ken Holman)
Or, as I like to say, "A document layout described in a text file."
XSL-FO is black box formatting. Can't go back and "tweak" the files to fix them. With FO, you're typically talking about a minimum of a couple hundred pages. Much faster to render automatically rather than by hand in InDesign or FrameMaker.
First commercial products in 2001 from Antenna House and RenderX. Also, open source FOP from Apache in 2001. FO successful in the sense that both commercial companies are doing quite well.
FO more successful than any other technical publishing application other than perhaps TeX and FrameMaker. Probably attributable to the availability of open source (free) and trial versions from commercial vendors (free).
XSL-FO is only concerned with visual display of XML data, which means that the FO file has no semantic content, only formatting instructions.
The FO stylesheet specifies:
- page areas and sets of pages to be used to compose a document for paper (master pages)
- Text flows, areas on pages into which the text and graphics are filled
- Blocks within flow areas (paragraphs)
- Inline areas (character-level formatting)
- Processing and formatting are consistent and automatic.
- Formatting rules are stored separately from the data.
- FO is non-proprietary and human-readable (well, sort of)
- FO less complicated than programming Java or Perl and the like
- Can use stylesheets with different XSLT processors (DITA Open Toolkit)
- Easier integration with other XML standards compliant applications (not trivial, but much easier than other non-standard approaches)
Most business documents can be formatted automatically as FO. Rule of thumb: "If it's XML, FO can be applied."
Other applications for FO might include faxes, German railway tickets, correspondence from financial institutions and government.
Typesetting is very complex with issues like widows and orphans and hyphenation. Software can handle this. Human typesetters have been removed from the process, and this shows in amateurish mistakes. But you can use FO to configure something that follows typography rules and give you a professional look and feel.
"Overwhelming benefits" of using FO. Which begs the question: "Why aren't more people using it?" A slide with the benefits of XML showing The Usual (cost, time-to-market, less redundancy, standards-based, localization for cost justification, etc.).
People who use FO: auto manufacturers, cell phone manufacturers, banks, aerospace, government, military, educational
FO not appropriate for documents that are "artistically created."
FO extensions provide support for:
- Document info in PDF
- Bookmarks for PDF
- Column footnotes
- Revision bars
- MathML
- Embedding PDF within PDF
- Column rules
- Punctuation spacing
- Table autospace
- Floats
- Advanced hyphenation
- Barcodes
- several hundred extensions altogether. Antenna House uses multilingual requirements with extensions, such as special spacing requirements in Japanese or justification in Arabic through kashidas.
DITA Open Toolkit reduces complexity of getting set up and produce PDF. Could be configured and producing PDF in "a couple of hours." (Perhaps, but making it look the way you want is going to take a while.) According to Mike, somewhere between a few days and a few months, depending on the complexity of your requirements.
PDF output from DITA
- XSL-FO
- FrameMaker
- troff
- Preprocessing. Information is parsed and assembled.
- Transformation. Formatted and generated.
Why not FrameMaker or InDesign?
- Formatting is the tip of the iceberg. (WYSIWYG)
- WYDSIWYN -- What you don't see is what you need, which includes content management, automated formatting, multilingual formatting, global access, project tracking, electronic delivery, network integration
- You need to manually lay out pages.
- No fixed page style
- Need to modify page layout
- Unstructured document formats
- Document format is continuously changing
- Unstructured content
On the low end, FO is free with FOP. Antenna House is most expensive at $1250 for stand-alone or server license for $5,000.
FO supports more languages than any other solution currently available.
Solving the real problem:
- Improve the total process, not just individual tasks
- Improve organizational effectiveness
First question: Flowing text into typesetting engine results in line breaks that will cause readers difficulty. And this annoys him (as a professional typesetter). We want powerful, automated formatting AND the ability to do WYSIWYG tweaks. Thinks there is a role for a WYSIWYG stage after the automation bit.
I've noticed this on the BBC, too. British people ask really pointed questions.
And in response, Mike says that Antenna House has a solution for this where you create INX (InDesign XML) content (4 minutes) and then you can pull it into InDesign (half an hour), and do some cleanup.
Do all the XSL-FO tools cover 100% of the FO standard? "No, definitely not."
Labels: conferences, dita, xml, xpubs, xsl
Wednesday, March 19, 2008
WritersUA: Day 3, Morning
Dave Gash (hypertrain.com) leads off the festivities with a discussion of the UA Holy Grail. And no, it's not DITA.
He is discussing True Separation of Content, Structure, Format, and Behavior.
Interesting, because we normally hear about separation of content and presentation -- he's making finer distinctions.
According to Dave, the current authoring method is to using WYSIWYG and code editors, often in combination. And as we work, we insert what's needed wherever it's needed. The result is that documents work -- once -- but are very difficult or impossible to update, maintain, and control.
Spaghetti-code documents make our own jobs harder.
The conventional wisdom is to separate content and formatting. Content is "stuff on the page"; therefore format must be "everything that is not content."
Content could include HTML, CSS, and JavaScript. Separating out CSS still leaves "junk" in the content pages.
Dave proposes a more refined model: content, structure, formatting, and behavior.
* Content is XML
* Structure is XSLT
* Format is CSS
* Behavior is JavaScript (JS)
This will be more maintainable, which means:
* Ability to change any components without breaking the others
* Ability to reuse any component in other pages or projects
* Ability to control each component's resource allocation (that is, who creates each piece?)
How to improve your pages:
1. Identify and externalize JS behavior.
* Find the embedded scripts (<script> tags) and remove them with a reference to an external foo.js file.
<script language="javascript" src="foo.js"></script>
2. Identify JS behavior that could be CSS and convert it to CSS rules.
"If you can encode with CSS and make it declarative instead of procedural, you're way ahead of the game."
* Catch "sneaky" JavaScript behavior, such as mouseover events, that could be CSS rather than JavaScript. Event handlers that call JavaScript almost always start with "on" -- easy to identify and many can be replaced with CSS hover pseudoclasses.
.expterm:hover {font-style:italic; }
.expterm {text-decoration:none;}
Removing the code from the HTML greatly simplifies the page.
3. Identify and externalize CSS styles, recode any local formatting as classes.
Get rid of "deprecated tags and doo-doo like that."
Get rid of style attributes, font tags, b tags (become span tags).
"It's said that comments are for someone who comes behind you six months later and needs to update your code. This is not true. Comments are so that YOU can figure out six months later what you were doing in the code."
So you should comment your code.
4. Semantically mark up content as XML.
Dave's definition of semantic markup? "call things what they are."
5. Identify desired HTML output structure, write XSL transforms to produce it.
So...what's in it for me?
Discrete, maintainable, controllable components
* you can change one component without breaking others
* You can share components with other pages
* You can separate work load by skill sets
* Set it and forget it! (for everything except the content)
Code examples are available at Dave's web site: www.hypertrain.com
Questions about tools. No, he won't recommend tools. Question about schemas...Dave says the first thing that comes to mind is...DocBook???
Yikes. In an answer to a question about print and XSL-FO, somebody recommended asking....me! (I swear I didn't pay her for that, and I don't think she even knew I was in the room. Quite surreal.)
##
My only disagreement with this session is with the separation of XML as "content" and XSLT as "structure." It's my opinion that the XML includes the structure, and XSLT just gives me a way to express that structure into HTML or other formats.
I also question some of his tag names, such as <expander> for a term/definition group. The expander tag name is really a description of the desired behavior (expandable text) rather than the semantic function of the content (definition of a term). I would probably choose something like <glossaryitem> for the container, leaving opening the option of changing the behavior to something other than expansion in the future. Same quibble with <ddblock> (drop-down block).
I do like the use of the
Great presentation from an energetic presenter whose motto is, "If I have to be awake, you do, too!"
Side note: I'm pretty sure that if you tied Dave's hands behind his back, he would lose his ability to speak.
Labels: presentations, writersua2008, xml, xsl
Tuesday, May 01, 2007
Writing better XSL
Jeni Tennison has a new blog. Her latest post has tips on when to use template matching, named templates, and for-each statements.
In my experience, most people who are new to XSL overuse for-each loops, because they most closely resemble familiar programming constructs.
Labels: xsl
