Perils of DITA publishing, part 6: EPUB and Kindle
In which we jump through flaming hoops for EPUB and Kindle.
For the EPUB version of Content Strategy 101, we decided to create a DITA Open Toolkit plugin instead of going the usual route of hand-compiling the HTML, manifest, and table of contents package. Among other things, we wanted an automated way to handle the indexing, as well as the part and chapter numbering and cover image. Starting with the OT’s base XHTML plugin, we created overrides to the default XSLT templates to incorporate our in-house styles and custom fonts. This is standard fare, and easily achieved with small changes to the base plugin. Considering the extent of the work required to create the full EPUB package, however, we opted to create an altogether new transform type in the plugin.xml file and went to work from there.
Every EPUB requires manifest (OPF) and table of contents (NCX) files — read the spec here — so we knew we needed additional ANT targets to crawl through the DITA sources and collect the pertinent information. This required creating a new build file for our plugin that would pull from the ANT targets in the base XHTML transforms and add our new targets alongside.
For the EPUB TOC, we created an ANT target to loop through the bookmap and write the headings you see in the left-hand column of the EPUB reader. A couple of challenges arose here, as our part and cover information was stored in
<data> elements in the bookmap, so there weren’t HTML files to link to. We got around this by writing additional templates to create HTML files for the cover image and parts information based on corresponding metadata in the map file. We also wanted the TOC file to display the chapter and part numbers, so we had to do some wrangling to make our transforms pick off the chapters and parts and to number them appropriately without interfering with the order of the TOC itself. We did this by counting the parts and chapters and checking their positions against the number of preceding siblings.
The manifest file has two parts, the manifest itself and the spine block that corresponds to the TOC NCX file and sets the “play order” for the book. For the manifest section, file order isn’t as big a deal, but you must account for every single file in the package, and you must designate the cover image as such using the @properties attribute in the cover’s item entry (more on this at the Threepress blog). To achieve this, we added another ANT target to accomplish two things: 1) peek in the source directories and drum up a list of all HTML, images, CSS, and font files and give them unique IDs that correspond to the entries in the spine block, and 2) crawl the bookmap itself to set the order of the spine block, including entries for the cover and parts files generated at runtime.
Index your heart out
DITA indexing has its own attendant heartaches, which Alan Pringle heroically overcame for this book. For the EPUB, to keep it (relatively) simple, we created a further ANT task to crawl the DITA files and compile all the index entries into an intermediate file for grouping and sorting. From there, the index gets written to an HTML file for inclusion in our manifest and TOC files.
Fun with mimetype
Next, in our plugin build file, we created an ANT task to clean up the output directories and create the EPUB package. The .epub extension is just an ordinary ZIP file by another name, and of course zipping packages is no problem in ANT. However, every EPUB requires not just a mimetype, but an uncompressed mimetype. When zipping in ANT, you can set compression at zero to forego compression of any kind, but we did want to compress the HTML, CSS, images, and fonts — everything except for the mimetype. To get this right, we did a triple zip, where we zipped the mimetype with zero compression into its own file, zipped all the other content with compression to a second file, then combined the two as ANT
ZipFileSets. ANT’s ZIP task, among other things, allows you to combine existing ZIP files with any level of compression. Being the content files were already compressed, we simply combined the uncompressed mimetype ZIP file and the compressed content ZIP file while retaining the source compression, and voila, EPUB. Once we had our final product, we used EpubCheck to validate our work.
Here, Kindle Kindle …
For the Kindle version, we had to make minor tweaks to the EPUB, the most significant of which was removing the custom fonts, for sake of keeping it dirt simple. From there, we used KindleGen (Amazon’s command line utility) to create a MOBI file directly from our EPUB, and we were ready to go.
Hit me with your thoughts on Twitter (@ryan_fulcher) or in the comments below, and check back soon for more from our ongoing Perils of DITA publishing series.