In episode 68 of The Content Strategy Experts podcast, Gretyl Kinsey and Simon Bate talk about unusual outputs from DITA sources.
” With DITA, it’s incredibly flexible. We can generate almost any type of output that we want to with it.”
- DITA to InDesign: getting your paragraph styles in order
- DITA to InDesign: the gruesome details
- Strange bedfellows: InDesign and DITA
Gretyl Kinsey: Welcome to The Content Strategy Experts Podcast brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize and distribute content in an efficient way. In this episode, we talk about unusual outputs from DITA sources. Hello and welcome come to the podcast. I’m Gretyl Kinsey.
Simon Bate: And I’m Simon Bate.
GK: Today, we’re going to take a look at some different outputs from DITA that aren’t very common or widely used. I think the best place to kick off here is to talk about what is commonly used. What are the sort of more typical outputs that you see from those sources?
SB: Yeah. We can actually divide this into two areas. One is the output formats and then the other is the output type itself. Among the usual DITA outputs, we have things like manuals, guides, essentially anything that’s paged. We’re predominantly talking about PDFs. Then there’s other outputs, which are more collections of HTML pages, whether it be websites or whatever. Then there are the formats themselves. Of course, the two groups that I’ve listed here, there’s a PDF output and HTML output.
GK: Those are ones that are kind of delivered standard with the DITA open toolkit. One thing that we see a lot at Scriptorium is companies that ask us to come in and build customized versions of these outputs for their DITA content. We’ve had a lot of companies that want one or both of these output types and sometimes multiple versions. They might have one PDF transform that handles their manuals. They might have one that handles their data sheets or some other smaller file type, and then they might have HTML for all of their content as well so that they can deliver everything across the board in different ways.
SB: Data sheets themselves are an interesting jumping off point for a discussion about unusual outputs because while I consider manuals and guides to be fairly standard output, data sheets often are an odd duck. Often you have a mapping where you have one DITA topic equals one data sheet. That’s not necessarily true, but that’s what we see a lot of the time. But data sheets because of the density of the information that’s in it require often a specialization or a lot of output class usage and with that comes a great deal of author training or buy in. Anybody writing a DITA topic that’s going to be converted to a data sheet has to know right from the start that that is one of the… The data sheet is a possible output for this content.
GK: Absolutely. I think, like you said, that is a good jumping off point into talking about some more unusual or not so typical outputs that you might get from your DITA sources. I want to start off that discussion by talking about some of the benefits of these less typical outputs. What might make a company say, okay, we’ve got a real case here to go from DITA to something a little bit more unusual than PDF or HTML?
SB: Well, often we find that the clients want to do this because they’re using their DITA already to create what we consider a usual output, but in addition for one reason or another they have a requirement for generating some other kind of output. Part of the desire is to use the same DITA sources to generate both the standard output and to go to some specialized or unusual output format.
GK: Absolutely. I think some of the examples that we’re going to get into and talk about more in depth, one of them is something that we’ve actually covered quite a bit on the podcast, on our blog and even in our LearningDITA live presentations and that is going from DITA to InDesign. One thing that we’ll do is include in the show notes for this episode some links to all of the different content that we’ve produced around that. One of our consultants, Jake Campbell, has done a lot of work on DITA to InDesign and that’s definitely one of those sort of unusual output formats from DITA. But the use case there is that for the most part, the DITA content was going to those usual outputs like PDF or HTML that there were a few types of documents.
GK: Maybe you’ve got data sheets or maybe you’ve got a marketing slick or something that needs to be a little bit more highly formatted, highly designed and customized before it’s actually sent to the printer or posted on the website or what have you. In that case, taking your DITA source into InDesign and doing some of those really specific tweaks to the formatting that you can’t get from something more standardized like a PDF transform is a really good way to do that and not compromise your design.
GK: That’s kind of one of the possible use cases for going to a sort of less typical output format is if you for the most part want to have a standard templatized design for your PDF output, but maybe you’ve got this one set of data sheets or something that does need that extra finessing in InDesign, then you have that transform that takes your DITA sources to InDesign. Then that way you still have all of your DITA in a single source and you don’t have sort of disconnected content being done over here in InDesign and then all the rest of it in a different repository in DITA. You still have that shared repository single source. That’s a really big benefit there. I want to get into now talking about some examples of unusual outputs from DITA.
GK: Simon, I know you’ve done a lot of work on transforms for these, and so I wanted to just ask about some of the different ones you’ve done and what that kind of process has involved.
SB: Well, one of them that we can talk about right away is sort of usual and that is EPUB. EPUB, of course, this is standard. Now this is what, version of three of the standard. Essentially what it means is taking HTML output and then packaging it together with a number of other XML files, the document that describes the structure of your EPUB. In there, as well as with a lot of things based in HTML, most of the work is actually in building the files that describe the thing. We’ve already gotten the transforms prepared for doing the HTML transform. Usually it requires not much change for going to an EPUB. Sometimes some CSS work. But for the most part, the actual work is in doing, as I say, the packaging.
SB: With EPUB, that gets to be one of the problems because I’ve found in working in EPUB it’s a very frustrating standard to work with.
GK: You mentioned that EPUB is a little bit of a difficult output type to work with. What are some of the challenges that are involved with developing a DITA to EPUB output?
SB: A lot of them are actually in the sequencing. There’s a particular XML file that describes the order in which things come. It’s been a little while since I’ve touched it, so I can’t remember exactly where the problems lay. But there were issues with particularly dealing with the front matter of the EPUB, trying to get a title page in, trying to make sure the table of contents fit in, and other pagination things around that. That was in particular the really hard part. Flowing the text, most of the text actually is very straight forward. Some of the problems come with things like titles of the content. For normal structure of content with various nested topics with titles in them, those will fall out okay.
SB: But when you start introducing things like a topic head in a map, there’s not much provision within the EPUB standard for a title to exist without any content below it. You have a title, then you go straight to the title of the next thing down, that’s rather difficult to deal with an EPUB.
GK: It sounds like there are difficulties with regard to how EPUB renders the DITA structure, but then one thing that I can remember from testing EPUB output as well is that there’s a bit of a challenge for making sure the EPUB displays consistently across different mobile devices as well. I know that that’s a big consideration if you’re thinking about EPUB output is how much control do you want over how it displays on an iPad versus a Kindle versus any other sort of e-reader or mobile device or tablet because it’s really, really difficult to ensure it looks the same and I would say probably impossible to make sure it looks the same.
SB: Not just across different devices. There are also a number of different readers out there on some platforms. On Macintosh and on PC, there are a number of different readers. On some less restrictive tablets, say Android, there are a number of readers you can find. For Apple, there are a handful of readers. The Apple Reader itself has its own quirks. When you test it, you have to look out for all of those things. Kindle actually brings up a whole different set of problems because the Kindle format is not quite the same as the EPUB 3 standard or EPUB 2 even. You have to make additional changes, additional modifications to go to Kindle.
GK: Yeah. I think those are all really important things to think about. Sort of with all of these unusual outputs that we’re talking about, there are sort of different risks and different considerations to make sure that you think about before you start building those outputs.
SB: That’s right. It’s not just the transform. It’s the testing. For some of these formats, that can consume great amount of resources.
GK: Absolutely. What’s another unusual output that you’ve worked on?
SB: I think through this discussion we’ll be diving deeper and deeper into weirder and weirder outputs. The next one again can be expected to be a normal output in some sense and that is LMS or learning management systems. Often people want to go either from a normal DITA that is topic, concept, task, and reference or even the Learning and Training Specialization into a content that’s consumed by a learning management system. Of course, there are dozens and dozens of learning management systems. There’s a wealth of experience to be had there and we haven’t even touched much of it at all really. One thing that’s used a lot in learning management systems is the SCORM standard.
GK: Yeah. One use case that I wanted to bring up with regard to learning content going into a learning management system is actually LearningDITA.com.
SB: That’s right.
GK: That is, as most of you probably know, Scriptorium’s free e-learning resource for DITA training. We have actually or I should say Simon has developed the process that takes content from DITA into the LMS that we use for that.
SB: That’s correct. For LearningDITA, we used the Learning and Training Specialization for all the sources. In fact, if you want to, you can go into GIT and access Learning and Training sources yourselves and see what we did with it. Now, moving it into the learning management system was an interesting thing because first we had to find a learning management system. We found one that’s actually a plugin for WordPress. WordPress itself brings up its own issues. The transforms themselves, we had to do several things. One is we had to figure out how the learning management system fit into WordPress and what the files looked like for that.
SB: Now, when you’re looking for learning management systems, if you’re going to be doing anything like this, one important consideration when you’re looking at learning management system is to think about the import, the import limitations or whatever facilities there are on import for the LMS. It turned out for us what we needed to do was to craft some files in a particular form and then be able to import them into WordPress. A lot of the work there really was reverse engineering. We took a look at WordPress import and export files and found the important parts, the pieces that we needed to preserve, and what we could pull in from metadata from the topics, what we actually had to specify when we were doing the import.
SB: Then we created the transform to take our DITA and transform it to the XML, which we can then import into WordPress. Now, in addition to the actual topics themselves, the learning management system managed the questions. I’m sure many of you have been in LearningDITA and you’ve experienced the quizzes at the end of each of the sections and those quizzes are managed by the learning management system. There’s an entirely separate file format that we had to come up with for that. We had to, again, reverse engineer how the learning management system needed its question.
SB: Then there’s also a complex process that we go through to first import the topics themselves into a WordPress and then a separate process for importing the questions into the learning management system and then tying the whole thing up and tying it up with a bow.
GK: If you’ve got a situation where you have content creators, let’s say in the training department they’re working in DITA and they need to create output that goes to a learning management system and let’s say you’ve also got your technical content is sharing that same DITA source and maybe some other different departments, they’ve all got content in that same DITA repository, what are some of the considerations that the training team would need to keep in mind when it comes to choosing an LMS so that that output can be as efficient as possible?
SB: That’s kind of hard. I think a lot of it gets back to my initial statement that the import facility has to be there. Much of the issue with learning management system itself is just mapping from the DITA into what you can move into the learning management system. With DITA, it’s incredibly flexible. We can generate almost any type of output that we want to with it. I can’t think of any limitations actually in the authoring because almost everything has to be done in the transform itself. Now, once you’ve selected the learning management system, that selection process may come with certain limitations, certain things that are possible to do in learning management systems, some things that are not. That then is going to feed back into what the writers can do or what your content creators can do.
GK: Yeah, and that’s why it’s so important I think to keep up that communication amongst everybody that’s going to be using your DITA sources and contributing to it and making sure that what one team does doesn’t affect something that another team’s going to do in a negative way and that everything’s working together as sort of this DITA ecosystem. Speaking of training materials and training content, you’ve also developed another output type, which is DITA to slides. I wanted to talk about that a little bit.
SB: Yeah. That actually falls into two different groups. There was the initial attempt that I made a number of years ago. As part of my work here, I do a lot of training. I thought, well, the training content itself ought to be in DITA. That’s fine for putting together the sheets that I work from when I’m doing training, but then we’d also want those same materials to be presented in slides on the screen while I’m doing the training. It occurred to me that I could write a transform. HTML seemed to be the obvious choice. It was a fairly flexible and it could be used almost anywhere. We can take this content and transform it, and I can generate my slides and I can generate my handouts and other training materials all from the same content.
SB: There were some things that I had to do and this actually will get into the second aspect of doing a training or slide material and that is there has to be a system somehow of indicating what you do want to have on the slides and what you don’t want to have on slides. With my first slide transform, what I was able to do is make certain rules about where things appeared in bulleted lists, whether it was in a paragraph within the list item or not, and then add some output classes to say, this is not for the slides, this is not for the printed output. Then using those rules, I could generate materials for both. The second effort onto doing slides is a little bit more complex. This is at a client request.
SB: They had a bunch of training materials and they needed to have it not just as handouts, but they wanted to use PowerPoint. We will talk about going to Word a little later, but we’ve had some previous experience in trying to go to Word or Office packages. This time around it occurred to me there were two things, and one is my experience was in dealing with almost anything in Office, hierarchy is mostly… Hierarchy is ignored. You have to throw out the hierarchies. That is you have to flatten your structure. But the other thing was that in our other effort, we went directly to the XML, the Office XML format. That turned out to be a really, really hard thing to do.
SB: This time around it occurred to me, well, Microsoft Office has a great VBS, that is Visual Basic Library, for loading things into PowerPoint files. What I did was created something that’s a two step process. The first process is to take the DITA and to flatten the structure. While I’m flattening, I can do a lot of pre-processing, I can identify things. The output of the pre-process is essentially built with slides in mind. As I’m building this out, I can build out decks of slides from the content and tag things accordingly. This output format by the way is not XML, and I’ll get into that in a little bit later. With the output format I can then put all the content that’s going to go out and then I take that output format and run a Visual Basic Script on that output file, on that flat file.
SB: The Visual Basic then actually finds the PowerPoint template, opens the template as a new document, and then starts to load content into that template slide by slide based on the content of the flat file. Because it’s based on the content of the flat file and because I found parsing limitations very, very restrictive in Visual Basic, I just used a plain text file that has some simple delimiters. It would be really nice, it would be much, much nicer if I could have used XML, but unfortunately I couldn’t. I looked into a number of different ways of using XML in Visual Basic and it’s just not possible. I can parse the file with my simple rules in Visual Basic and load it all into the slides.
SB: One of the other things that I found as I was working in Visual Basic was that there are actual differences between how Visual Basic behaves in windows and how it behaves in Macintosh. I do a lot of my development work in Macintosh, but the client was in Windows and we knew that was going to be an important target for them. We started testing in Windows and found that things that I had developed in Macintosh just did not work under Windows. Interestingly while I was trying to develop some other things in the process itself, I found the lesson back the other way. I was looking, did Google searches, trying to see how in Visual Basic I could create a file selection dialogue let’s say to find the file that we’re going to be loading into the template.
SB: I could find lots of things about how to do it on windows. I thought, well, it should work just the same on Macintosh. It turned out it didn’t. On Macintosh, actually I had to write a whole separate routine for locating the file and loading that file into the script.
GK: Yeah, and I think that really gets back to some of the points that we made earlier when we were talking about EPUB and testing across different platforms and different readers, and then the same thing with going to something like SCORM and testing across different LMS’. It’s going to be different across different systems, operating systems as well. That’s something to keep in mind if you have to build one of these types of outputs to consider are you just using Windows or just using Macintosh or do you maybe have a use case for both? That’s all going to play an important role in kind of how much time and how many resources are going to be involved in developing an output like this. Earlier you mentioned that you had done some work for not just PowerPoint, but for Word as well.
GK: Tell us a little bit about that and kind of how a DITA to Word transform works.
SB: Right. To recap what I was saying initially was that we had gone from DITA straight to the Word DOCX XML format, which turned out to be very, very difficult to work with. It’s very, very difficult to test, very difficult to get things right. It expects things in a very particular order, and it expects all the content to be flattened out. We were successful. We managed to complete the project going to Word. But if we were to do it again, we would certainly use the Office libraries and again use Visual Basic. The nice thing is now that we’ve got a format that we can use for flattening the file, the text file that I’ve developed for PowerPoint will actually work very well for Word. In the future if we need to go to Word, we’re all set and ready to go with that.
GK: That’s really great because I think it is pretty… I don’t know if common is the right word, but I think it’s pretty smart if you have got a lot of people using Microsoft Office products that you might want to have an output that goes to Word and an output that goes to PowerPoint, that both kind of used that Visual Basic starting point. I think that makes a lot of sense if that’s kind of a need at your company that you know that you’ve got a lot of people that need to take that DITA content into various Microsoft Office programs, that having that Visual Basic beginning point is really a solid plan.
SB: There’s actually a third Office product, which leads actually into the next area of things that I was going to discuss, and that is Excel. Because Excel, of course, spreadsheet is nothing more than a database in a matrix. We’ve done a number of things converting our DITA content to database formats of one kind or another. But some of the other formats that we’ve gone to for database, they’re all fairly much the same and because they’re all text formats, fairly easy to go to. That includes comma separated value files. There we’ve often had people who say, well, we need a table converted to a comma separated value files so that then we can load it into a database or we can load it into Excel. We’ve also done a number of things using JSON as our output format.
GK: We’ve talked a lot about taking DITA content into some of these Microsoft-based products like PowerPoint and Word and Excel, but what about going in a different and I guess more visual direction and taking DITA into SVGs or Scalable Vector Graphics?
SB: That’s actually an interesting thing and for me a very fun thing to do. I like playing with graphics. I like playing with SVG. SVG itself is nice because it’s an XML format, so we’re going from DITA, which is XML, to another XML format, which is always a whole lot easier than trying to go to something else. We’ve gone from DITA to SVG for a number of different output types. Some of them are things like diagrams of registers in chips. We have content in a table, and we can take that tabular content, which specifies a bit offset position, width for the field, and what’s the content of that register, watch the register’s name or actually the field’s name, and then lay those things out into an image that looks vaguely descriptive of the way that register appears.
SB: This was incredibly useful to one of our clients because they had thousands and thousands of these things. The information was extracted first from a database and then moved into XML in DITA and then we pulled it out and were able to format this.
GK: Yeah. I’ve seen a lot of cases too where you have parts diagrams where different pieces have to be labeled. For localization purposes, they wanted the text to be in one layer and the image to be in another. That’s where SVGs were really, really helpful. We’ve seen that as a major use case for going from DITA to SVG. We’ve also seen things like with training content, if you’ve got a hotspot style question or something where you’re matching up pieces of text to pieces of an image, then that’s where SVGs can be really helpful as well to again have that separation where your text is in one layer, your image is in another. That works for both that and for localization purposes. There’s a whole lot of benefit that you can get out of having SVGs as an output format.
SB: That’s correct. It works not just in the SVG, but actually in the DITA sources themselves because we have one client where they have some massive tables that describe in detail how you put together a particular part number that describes a particular thing. Again, there are fields where there are values, so an A represents a yellow one and a B represents a green one, things like that. We can take that information from the DITA content and create a diagram that shows again how a person making an order would put together the part number for their appropriate piece of equipment. One of the things we can do at the same time is we can generate a list on the side of what are the actual names of these things.
SB: Now, this information comes from DITA and the DITA can start out in English as the primary language, but also the DITA then can be translated. We can take in that translated DITA and then convert it into just the same table, but only in German or Swedish or Spanish or whatever we want to choose at that time.
GK: We’ve talked about SVGs as something where you’re going from DITA which is one XML format to another. I know that one thing that I wanted to address is going from XML to something else instead of necessarily DITA to something, are there any cases where you’ve just gone from XML to another format or maybe XML to XML in a similar way that you’ve done with SVGs?
SB: Yes. Getting back to some of our earlier examples, we have gone from DITA to XML when dealing with training materials because again, we’re dealing with content in an LMS. The LMS’ input isn’t necessarily going to be DITA. In fact, it usually isn’t DITA, but often the LMS will take its input content in an XML file. We have to go to the XML file to do that.
GK: I want to kind of attempt to wrap things up with a final I guess not just question, but set of questions or considerations around unusual outputs and that is just what advice would you give if a company is thinking about maybe they’ve already got PDF or HTML or something that’s more typical, but then they’re thinking about adding maybe DITA to PowerPoint or DITA to InDesign or something that’s a little bit less common? What advice would you give them regarding the time and resources involved and some of the challenges that they might come up against that they might not have encountered when they did their more typical outputs?
SB: Well, there’s a great deal of crystal ball time, of course. The real problems you’re going to find are when you get to a brick wall. You work on something and then you find that actually there’s no way to do it or it’s going to require something different. Often that something different in DITA translates back into either using an output class or creating a specialization. If you can, look at the formats, look at where you’re going and what are some of the requirements of that format and are there going to be things that may be difficult to come to from DITA. You can do some of that work early on, but a lot of that experience, a lot of that learning is going to actually occur when you’re actually trying to go into whatever format you’re going to.
SB: I would say in general pad your estimates, build in a lot of extra time to allow for dead ends, allow for where you had to try… You thought your implementation was going to go in one direction, but you found out eventually that you have to do something different for that.
GK: Yeah. I would agree 100%. I think that going to something that’s a less typical output does require a whole lot more time and resources for testing, for not just testing, but testing the limits of what’s possible. It’s important to think about that and not say, oh, you know, it took such many hours to develop PDF, so it’ll be about the same for InDesign. That’s absolutely not the case at all. You really have to think about what are you trying to do? What are the possible limitations that you’re going to run into? What are the compromises that you’re willing to make when you do run into a limitation because it’s pretty much inevitable, and how much budget or time resources do you have to dedicate to developing that output?
GK: Those are all really important things to think about when you’re still in the planning stage before you get too deep into it.
SB: That brings to mind another thing is part of your work is going to be training your authors because there’s often going to be things, whether it’s an output class or a specialization, where the authors are going to have to know about particular decisions you had to make, things they have to do, things they have to do a particular way in order to get it to work. You’d like it to be just perfect that you can author anything in DITA and convert it into whatever your target format is, but the truth is you will find limitations and you will have to work around those limitations, but then you have to communicate how do you work around those limitations to your writers.
GK: I think that gets to a point too about kind of what are the importance of your different outputs, what’s the priority for you. Because if you have a very, very strong business need to go from let’s say DITA to Word and that’s kind of a much more atypical output than DITA to PDF or something, but that’s something that’s very, very important for you, then that does have a lot of impact on maybe how you’re writing and structuring your DITA content. It cuts both ways and you can’t just take one particular method or standard of writing your DITA content and then say this is going to work across the board for PDF and HTML and Word, PowerPoint, InDesign, whatever. You have to think about which outputs are the most important to us and then what needs to be in our DITA content model to support that.
SB: Yeah. On top of that, I would say the last thing on testing or trying to come up with your estimates, and we’ve hit on this a number of times already in here, is just that there are differences across platforms, there are differences across tools. If you’re going to be using a number of different ones, you’re going to be using several different platforms, you have to make sure that’s part of your testing plan. You have to also plan for that in your time to know that you’ll have to add extra time to build in those accommodations for those other platforms.
GK: Absolutely. I think kind of to wrap things up, our final parting words of advice would be something along the lines of these unusual outputs can do a lot of really cool and interesting things for you and they might satisfy some really important business requirements, but it comes with the caution of plan ahead. Really, really think about the considerations as you would do with anything content wise before you go ahead with those types of outputs.
GK: Well, thank you so much, Simon, for joining me today. And thank you for listening to The Content Strategy Experts Podcast brought to you by Scriptorium. For more information, visit scriptorium.com or check the show notes for relevant links.