Skip to main content
November 4, 2017

Full transcript of specialization podcast

00:02 Gretyl Kinsey: Welcome to the Content Strategy Experts podcast brought to you by Scriptorium. Since 1997 Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way. In Episode 16, we discuss DITA specialization.

00:20 GK: Hello, and welcome to the Content Strategy Experts podcast. I’m Gretyl Kinsey, and I’m a Technical Consultant at Scriptorium.

00:27 Sarah O’Keefe: And I’m Sarah O’Keefe, I’m CEO here at Scriptorium.

00:30 GK: And today, we’re here to talk to you about and DITA, and in particular, how to specialize DITA and some situations where that might benefit you. So first, let’s just talk about what DITA is in general.

00:45 SO: So “ditta” or possibly “deeta” if you’re in Europe, DITA is the Darwin Information Typing Architecture. It is an XML standard and it is used heavily for product content and technical documentation. It is particularly, or originally, was particularly, intended for software documentation, but is now used for other things, heavy industry machinery documentation, e-learning and just a whole host of other things that are all technical.

01:17 GK: Mostly technical, yes.

01:18 SO: Technical, medical… Yeah.

01:20 GK: And one of the most interesting things about DITA is the ability to specialize. What is specialization for those who don’t know.

01:29 SO: What is specialization? So the Darwin in DITA refers to the concept of specialization in the theory of evolution. In DITA itself, what you have is a baseline set of elements, and then, you have the ability to add to those elements and customize them, and specialize, and create new elements that meet your particular requirements better. And what makes specialization unique is that when you do this, when you add additional elements or customize, change the things that are there, that specialization process leaves you with a set of elements that are still valid DITA. They’re not a custom weird DITA prime, they’re not an offshoot of DITA, they’re actually baseline DITA.

02:19 GK: And what’s the real benefit of this?

02:22 SO: So the real benefit of this is that in previous standards that we had in the SGML and in the XML world, once you started specializing, you essentially broke the standard. You’re off-roading. You’re no longer in, for example, something like Doc Book or the ATA spec. And so what happens is that you then have to maintain that customization, that custom thing that you’ve built for your particular company. In DITA, even if you specialize, what you have is still viable as conforming to the standard, and what that means is that it is easier, and cheaper, and less expensive to maintain your DITA going forward. And then the other thing that’s interesting about specialization is that the processing layer still work, so you can use the out of the box, DITA Open Toolkit, and it will successfully process your specialized environment.

03:26 GK: Now, there are some specializations that are actually included with baseline DITA and I want to talk about that a little bit. So first of all, just to back up, there are some standard DITA topic-types like concept, task and reference that are used for a lot of technical content. Those were all originally specialized from the DITA topic. And so a lot of times, you don’t even think of those as being specializations, but they really are. And so that’s kind of a good place where you can look at where specialization has started. There are some other specializations included with baseline DITA as well that get a little bit more complex. And one of those that we recently discussed on the podcast is the learning and training specialization. And that’s a set of DITA topic-types and elements that are associated with learning content specifically.

04:20 SO: Right, so you can build an entire learning environment using the learning and training specialization. Our site is a good example of that. And definitely, the learning and training specialization would be an interesting one to look at if you’re in that area. Now, Gretyl, you’ve done a couple of projects with this, and we’ve had some projects where we just did sort of a little bit of specialization. But I think you’ve had a couple with very heavy specialization. Can you talk a little bit about what that looks like?

04:49 GK: Sure. So one example that we did that had a very, almost entirely specialized DITA environment is on our website as a case study, and this was with the AJCC, the American Joint Committee on Cancer. And they had a need for providing their content via API. And so what that meant was that they needed the ability to call for very small and specific chunks of their content via API. So what they needed was essentially a specialized element name for every single little piece of their content. So what that looks like was a DITA specialization at almost every level. So we first created a specialized topic-type and in their case, they’re covering different diseases, different cancers. And so we took the DITA topic-type reference and we specialize that into a topic-type called disease. So that was specialization, first of all, just at the topic level. And then we went further down.

05:58 GK: So within DITA there’s also a sectiondiv element, very similar to the section element. It just allows you to have different sections and chunks within your topics. What we did there, was we specialized, for each kind of section or heading that they had in their content, had a specific name that was consistent across every topic. And to give them the semantic value they needed to pull each one of those via API. What we did was we created a specialized element. For example, there’s a specialized introduction, and instead of just having a section div with a title of introduction, it’s just an element called Introduction, so that the API always knows if somebody says, “Get me the introduction from this particular disease.” Or, “Get me all the introductions from across all the different disease topics.” It can find that very easily and pull it in.

06:57 GK: And we also did this for their tables because they had very specific requirements where they used tables to show all of the different rules of how you diagnose cancer as stage one through four. We took the simple table element and all of the kind of elements within that for different rows and cells, and we created specializations that allowed for those rules to be shown and for the API to call forth just small pieces from those tables.

07:27 SO: So really, at the end of the day, when you look at that content, when you look at the tags that are in it, it doesn’t even look like DITA.

07:35 GK: It doesn’t.

07:35 SO: There’s some paras, some p tags, and a couple of other things, but the other extreme of this is some of these projects we’ve done where there’s been hardly any specialization and there what we tend to see is that a customer… Or we’ll look at this during the content modeling phase, and we’ll say, “Ps, and sections, and list items, and whatever. Those are all fine, this is just fairly basic content.” Usually, though, we have to specialize the attributes, right? So the metadata is unique to that particular organization and we end up making some changes in the metadata to accommodate the particular kinds of things that that organizations needs to do to label their information.

08:19 GK: Exactly.

08:20 SO: So clearly, doing minimal specialization is going to be cheaper.

08:26 GK: Exactly, yes.

08:26 SO: Less work. Now, in the case of the AJCC or somebody like that that has very specific structure requirements, what kind of costs are you looking at when you’re looking at something like that where you’re going to do this heavy specialization because you have that as a content modelling requirement?

08:46 GK: There are a lot of different cost involved when you do a very heavy or kind of every single element level specialization like we did for the AJCC. The three main areas where you’re going to have costs are with configuration, with maintenance, and with training. And with configuration, that’s where we saw a lot of the cost, in particular, with this project because there was a lot of back and forth testing that had to happen as we were building out the specialization and as their CCMS vendor was building out the API. There was also an issue there of not only lots of fine tuning and back and forth testing for configuration, but there was also an issue where their content creators were actually still writing the content as we were developing the content model. They had come up with a template, and said, “This is how our typical topics will be.” But there were a few diseases that had some slightly different table structures.

09:48 GK: And then once those chapters actually were written and came in, and then the AJCC began converting them into DITA, we realized that we had to make some pretty significant updates, I would say “tweaks”, but it actually goes beyond tweaks to the specialization to handle some of those different table structures. And we couldn’t just tell the authors, “Make your structures fit the template.” Because in this case, there were a few diseases… I think it was breast cancer was one that had just some different rules for staging that those had to be able to be reflected. Depending on whether you kind of have an idea in mind up front of what your requirements are going to be, there may not be as much configuration cost, but it really just depends on the circumstances. And that’s one where there ended up being a lot more cost and time involved.

10:41 SO: And then, once you’ve built all of this, then clearly maintaining it, like you said, something comes in or we’re talking here about medical research information, so all of a sudden, there’s new evidence or there’s an important new way that you can diagnose or assess, and that needs to be reflected in this content. And that would be a maintenance cost to have to potentially adapt the content model to some new kind of staging table or consideration.

11:10 GK: Absolutely. And that’s something they’re thinking about now because when they do these cancer staging editions, each editions is good for a few years, but then they’re already thinking about requirements for the next one. And after what they saw with some of the configuration issues and realizing that there were some different structures that we needed to reflect, they’re already looking ahead three or four years, and saying, “What do we anticipate?” So that’s where their maintenance is coming in. And then, of course, with training, that gets into an area that’s interesting because a lot of technical content creators may be familiar with just standard DITA, but when it comes to specialization, they need to know how all of these custom elements work and how to use them properly, so that the specialization will reap the most benefit for them.

12:01 SO: And of course, you’re can have a situation where there’s not going to be training anywhere out in the world for your particular specialization because it’s unique to that customer or that implementation. So then we look at this, and we say, “Well, that sounds expensive. So why don’t we just not specialize?” And of course, that looks good on paper initially, provided that you don’t have really stringent content modeling requirements. But the problem you run in to if you don’t specialize, is that the content model isn’t as good. It doesn’t reflect what you’re actually writing about or the information that you’re trying to convey as well, which means then, you don’t have these nice labels that say exactly what something is and you run into the problem of, “Oh, I can only use what DITA provides.” And not, “I can’t create my own elements, if I say no specialization.” So there’s always the sort of push and pull when we get into a project of saying, “Okay, are we going to specialize? And if we specialize, what’s that going to look like? How much is there? How much is it going to cost? And what’s the value of doing that specialization?”

13:14 GK: Right. And it’s interesting. One thing that you mentioned about not being able to use custom elements if you decide not to specialize, what that often leads to is a risk of what we call DITA tag abuse. Because if you decide, “We’re going to just kind of forego all those specialization costs and risks and just use standard DITA and basically just make it work,” there will probably be some instances where it’s like trying to fit a square peg into a round hole, and you say, “This element is not exactly intended for this purpose, but we’re going to use it in this way.” And really what would have been better would be to just create a specialization for that particular element rather than using a DITA element in a way that it’s not intended. Because what that does, is it strips away the semantic value that would have been originally part of that element.

14:10 SO: So, you’ve sort of got this… We started out by saying that specialization keeps you inside the standard. So it’s better than serve a custom, break the standard, go totally off road, but then within that, not specializing is clearly going to be less expensive from a configuration point of view than specializing, but there is value to specialization that can be great value in the example of content that is… That has unique requirements and needs to be tightly tagged. And then, I guess, that brings us to what are kind of the alternatives for dealing with this. If you want to consider specialization, but you also want to look at what your options are, then, I guess, the default first example’s always outputclass. But you can… DITA provides something called outputclass. So you can have a P tag and you can say, “Outputclass equals whatever.” Special formatting. And what you can then do is use the outputclass attribute to trigger special formatting on the output side.

15:17 GK: Yeah. So we’ve had several examples of that where there have been customers we’ve worked with that decided not to specialize, but they did need some different formatting on the output side in different cases. And it wasn’t really enough to make the case for saying, “Let’s create a whole new element for… ” For example, if you need a particular thing to just appear in bold or with a box behind it. But that’s where using the outputclass attribute can really sort of bridge that gap.

15:46 SO: Right. That’s actually, the box behind is a really good example. So you might create a section with an outputclass of pullquote or something like that, and it sort of gives you this little side bar thing or section and outputclass equals sidebar. “Do I create a whole entire specialization for that or do I just slap an outputclass on it and then deal with it in the processing?” And of course, one person’s appropriate use of outputclass is another person’s tag abuse.

16:15 GK: Right.


16:17 SO: There’s always that question… There are couple of other things that you can do. But definitely, outputclass is a reasonable alternative, especially when you’re dealing with things that are really more formatting and less structure.

16:31 GK: Right. And then when you do have some structure things, that gets into a point you mentioned a little earlier, which is kind of a more lightweight specialization. So we talked about the case of the AJCC and how basically every single element is specialized other than your things like paragraphs and notes. But there’s also the option of using DITA mostly “as is” standard, but then maybe you have just one or two specialized elements. And we have a few customers we’ve worked with that have taken that approach as well. We had a customer who pretty much the only thing they did was just have a specialized topic-type and within that that allowed them to have kind of a few different custom attributes and also just, I think, one custom element, but mostly they’re just using standard DITA “as is”, but with those few little touches of specialization that kind of help them tag the pieces of their content that need that extra semantic value that DITA can’t provide.

17:35 SO: It all just… At the end of the day, it just really depends on what you’re trying to do, how closely the information you have matches the baseline DITA spec, and then, what are you trying to do with that content downstream? So how heavily do you need it to be tagged and how semantic do you need it to be, so that you can do whatever it is you’re trying to do with the content downstream? If you got a pretty standard, “Well, I’m just doing PDF and HTML and whatever. And my content is pretty standard software doc.” Then you probably don’t need a whole lot.

18:10 GK: Right.

18:11 SO: We have a white paper that we have just released that covers this and actually has a lot more visual examples, which [chuckle] may help. ‘Cause I know with a podcast, we can’t show you exactly how specialization works. We could just try to explain it. But in that white paper, there are not only explanations, but kind of workflow graphics that show this is sort of, how you take the original DITA element, or attribute, and make the custom specialized version. [chuckle] We’re doing a lot of hand waving and…

18:44 GK: Which you can’t see.

18:44 SO: When we talk about specialization, we tend to to show a little virtual box and you make it smaller or bigger as you talk about specialization. So I guess this should have been a video podcast, which is a horrifying thought. But yes, I think that white paper definitely has some good examples and some visuals, which were not done by me, they were done by my partner in crime here.


19:05 GK: And what we’ll do, is we’ll link to that and to some other resources about specialization and also to the case study we did with the AJCC in the show notes, so that you can take a look at all of those different resources and learn a little more in depth about specialization.

19:21 SO: And we’d love to hear from you about what you’ve done with specialization, and what your experience has been, and get some additional examples of heavy or light specialization and how that all works.

19:33 GK: Alright. Well, thank you so much for joining me today and thank you all…

19:37 SO: Thank you.

19:37 GK: For listening to the Content Strategy Experts podcast. For more information, please visit or check the show notes for relevant links.