Skip to main content
September 14, 2020

Information architecture in DITA XML (podcast)

In episode 80 of The Content Strategy Experts podcast, Gretyl Kinsey and Sarah O’Keefe discuss information architecture in DITA XML and other forms.

“You have to look at information architecture in metadata starting from a taxonomy point of view. This means you are looking at the structure of the content as well as the organization of the data that’s used for search and filtering.”

—Gretyl Kinsey

Related links: 

Twitter handles:


Gretyl Kinsey:                   Welcome to The Content Strategy Experts podcast brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way. In this episode, we discuss information architecture in DITA XML and other forms.

GK:                   Hello and welcome. I’m Gretyl Kinsey.

Sarah O’Keefe:                   And I’m Sarah O’Keefe.

GK:                   Today, we’re going to be talking about information architecture. So I think the best place to start is just defining broadly what information architecture is.

SO:                   And that sounds so simple and yet we’re going to hit our first snag, because if you go look at this, you’ll discover that everybody in content across all the different aspects of it has an opinion about what constitutes information architecture. I think probably the easiest place to start is to say that if you’re looking at a website, then the way that that website is organized and structured and how the content is hierarchical, you start at the top, you go to the about page, you drill down to the team or the company history, that’s information architecture.

GK:                   Right. And that can extend not just to the way a website is organized, but whatever your delivery method is. So the same thing, if you’ve got a print-based piece of content, it’s that hierarchy, it’s how is it organized into maybe chapters or parts, and that really can apply across all different types of content. I think this is a good place to mention that it is really important to know your terminology and define it, because when you’ve got lots of different types of content that you might be working with, you might get some confusion going on if you don’t really clearly define what IA means.

SO:                   Right. Exactly. And we’ve had some kind of hilarious run ins with this, where we’re sitting in a meeting and we’re talking about information architecture and what we mean is how things are encoded in the DITA files, which we’ll get to in a minute, and it turns out that our counterparts in let’s say content design or UX or something like that are thinking much more about the website delivery layer and nobody is thinking about the print, right? So we have to really be careful about this and be careful to make sure that when we say IA, that we know which one we’re talking about and at which level.

GK:                   Absolutely. You did mention DITA, so I want to talk about that next. So what is the difference when you’re talking about DITA-specific IA? How would you define that?

SO:                   So in DITA, when we talk about information architecture, what we’re usually referring to is how exactly are we structuring the content and marking it up in DITA. So which topic types are you using and what goes into each kind of a topic? Let’s say you have a bunch of reference information. Well, the decision to, for example, put all your terms and definitions into the DITA glossary is, I mean, extremely sensible, right? But that’s a decision and sometimes you might discover that you need a reference topic that the out of the box reference topic doesn’t really help with, so you go down the road of specializing. And then I think, Gretyl, you’ve run into some stuff with DITA metadata as well.

GK:                   Absolutely. And that’s an area where it’s kind of its own thing. You have to look at information architecture in metadata starting from a taxonomy point of view. So it gets into not just what is the structure of the content, but what is the organization of the data about your content that’s used for search and filtering and organizing it and making sure that everybody can find it. So that’s something I think even before you start building out the content structure itself, it’s really important to think about that piece of it, because in DITA, you can have specialized metadata structures as well. So if that’s something that you’re going to need, it’s really important to plan that out and think about it and make sure that’s part of your IA.

SO:                   Right. We talk about DITA information architecture, and of course, whatever it is that we do in coding, the DITA files is going to feed into the information architecture on the delivery side, whether that’s website or print or app or it could be any number of other things.

GK:                   Absolutely. So I want to talk about some situations where you might need to deal with both a DITA information architecture alongside other types of information architectures. I think we kind of touched on at the beginning that you might actually have IA coming from different approaches and you have to have that conversation to make sure everybody is on the same page. So one scenario that I can think of right top off my head that lots of our clients have dealt with is just when you have different departments that are in different content workflows but they need some way to connect their content.

GK:                   And maybe they’re not necessarily all going to work in the same information architecture, they’re not all going to be in DITA, but each department still has their own IA for their content and they need a way to make it play nicely together, and there can definitely be some challenges and things to figure out when that’s the case.

SO:                   Yeah. DITA versus DITA is one thing, and that can be a challenge. But perhaps that’s actually the biggest challenge, because you also see a DITA versus not DITA content. So there’s a merger and the company has five departments and eight different authoring tools, and you think I’m exaggerating, but I’m not, right? So you have all these different groups, they have all these different authoring tools, they’re delivering to all these different places, and at some point you have to step back and say, “Well, wait a minute. What about the poor customers that are looking at all this stuff together? How are they going to access this information successfully and what do we need to do to make it consistent such that they can actually have some sort of a fighting chance of finding what they’re looking for?”

GK:                   Yeah, absolutely. And when that sort of thing happens, when you’ve got something like a merger, or even whether it’s a true merger of companies or just sort of a merger of departments within a company, you’re looking at maybe DITA in one place and then maybe things like Word, FrameMaker, InDesign, all manner of other things and other places. And it’s really important especially when you can’t necessarily get rid of one of those sort of non-XML flavors of content production, where you actually need something like InDesign for example, it’s important to think about the implied structure of that content and the enforced structure of your DITA content and how to make sure that when all of that content is packaged up and published for delivery, that it all works nicely together.

GK:                   Again, back to what I mentioned earlier about taxonomy and searchability, making sure that everything is organized in some way that the customers are not going to get confused, they’re not going to start complaining to support that they can’t find what they need. It’s really important to think about, “Okay, if this content is developed in different work streams and different ways but it still needs to be coordinated and shared, how do we make sure that those different information architectures work well together?”

SO:                   Yeah. Then there’s a similar but different use case, right? Which is the, we have all this DITA content, which typically is going to be some sort of technical documentation, technical product content, that kind of thing, that forms a corner of the website. So you have what most of our customers call the dot-com. It’s It’s the main website with all the marketing information. And somewhere on that website, there is a button or a link or something that says documentation or support or additional information or technical literature, literature library. I’ve seen all these kinds of names.

SO:                   And the information that lives in that technical documentation corner of the dot-com is coming out of DITA with its own information architecture, but then the overall website has a, I guess, big picture IA. So one of our pretty common jobs is to try and bring those two things together so that we can feed the DITA based technical information into this corner of the dot-com and make sure that the people accessing it, accessing the website in general, get consistent information, they get consistent user experience, customer experience, on the information that they’re trying to access even though the website was built by one team, probably, and the tech docs were built by a completely different team using a different technology stack, different systems, different everything, but it’s still possible to make them consistent.

GK:                   Absolutely. One area that I’m starting to see more and more is the idea of delivering content through dynamic delivery portals. So that’s another layer that you have to think about. Some of the companies I’ve worked with that are doing this have to think about information architecture on both the backend, so how are they actually structuring the content itself? And then also the front-end, how is that getting delivered through a dynamic portal? How does it have to be tagged and structured to work with the way that the portal actually gathers the content up and delivers it to the customer?

GK:                   Then if you’ve got a portal for a portion of your content, so like you were saying, Sarah, if it’s for documentation or for online help or training or something like that and it sits in a corner of the website, then you have to think about how does that fit in with everything else and are there other departments that are also serving up their content dynamically as well? How can you make that play nicely together? So it really is a lot to think about and to plan.

GK:                   I think one thing I’ve seen that’s helped a lot is just having dedicated content resources or content team that sits above all these different departments and looks at how the pieces of the puzzle come together and can say, okay, “This group over here has one information architecture, this group has another. Here are maybe some tweaks or changes that have to be made to make sure that’s going to work with how you’re publishing your content through a portal onto the website.”

SO:                   Yeah. I think that’s a really good point. And your distinction between back end and front end, I think, can be very helpful to talk about backend information architecture. How are you encoding the DITA files? How are you encoding the source files, whatever those may be? And then the front end IA, which is essentially how are you presenting them? How do your end users experience this information? Now, what’s interesting is that you probably want to have some consistency between those two things. I mean, it can be a little challenging if you have a back end information architecture that in no way represents what you’re trying to do on the front-end.

SO:                   That’s probably not going to end so well. But when you start looking at this from a development and a skillset point of view, I think it is actually very helpful to think about, “Okay, we’ve got to do some back end encoding for DITA and we’ve got to do some front end encoding for user experience and we need to make sure that those two things are in fact compatible.”

GK:                   Absolutely. I want to, for the rest of this discussion, focus on that back end specifically and what happens when you’ve got some scenarios where you are sort of merging or bringing different types of content together. One instance that I’ve seen is when you’ve got to take content from other sources into a DITA based single-source of truth. This is something that we’ve seen a lot where you may have a lot of Legacy content, you may have content in different types of documents. So you may have a lot of Word files, you may have a lot of FrameMaker files, things that are not working together well when it comes to getting your content all searchable and in one place and reusable.

GK:                   In those cases, people sometimes make the decision, “Let’s bring it all into DITA and really get that maximized reuse that we don’t have right now.” So one big challenge I’ve seen is how do you make that decision of how you’re going to take the content from those other sources and make them work with whatever DITA information architecture you’re going to have. I think this hits a lot into the process of conversion and making those decisions. What content are you going to keep from your Legacy content? What content are you going to throw out? Then of course once you’ve decided what’s your important content that you have to deliver, then you look at that implied structure that I mentioned previously, that any content even if it’s in something like Word, FrameMaker, InDesign, even if it doesn’t have an enforced tag-based structure, it’s still going to have an implied structure hopefully.

GK:                   Hopefully it’s not just no style guide and complete chaos all over the place. But typically it does have some sort of an implied structure. You see patterns in the types of headings that you have, the types of content that you’ve got. You may have lots of reference information like you were saying earlier, Sarah, or you may have a lot of task-based information. So looking at that implied structure and seeing how it fits into the information architecture structures that DITA offers by default is a good starting point, and that helps you make those determinations that we were talking about upfront of, do you need specialization? How are you going to organize your metadata? All of those kinds of questions. That’s sort of the starting point, is what’s that implied structure and how does that kind of carry over into an enforced structure?

SO:                   Yeah, I think that’s right. In addition, if you think back to the wonderful days of printer technology, there’s this concept of the gamut when you print, which is the range of colors that you can produce on a given printing press with a given set of inks, right? And there are certain colors that you simply cannot produce. So for example, if you want something to look metallic, you usually have to put in a special metallic ink to make that happen. You can’t get metallic out of the traditional four-color CMYK; cyan, magenta, yellow, and black. So the gamut is helpful to think about because… I mean, you mentioned FrameMaker.

SO:                   There’s specific things that you can do in FrameMaker where you can get a little creative with the stuff that you’re putting in your files that is really, really difficult to reproduce in another tool or another content model such as DITA, and vice versa. There’s things you can do in DITA that aren’t necessarily supported in your Legacy tools. So you run into this gamut issue where because you’ve never thought about doing a thing because you couldn’t, because it was impossible in your current tool set, you have to really think carefully about, do I want to implement that now as I move forward into the new tool set? Does it add value? And as you said, this is relatively easier if you’re starting from, “I have all this unstructured content in Word and I want to move it to structure.”

SO:                   That is actually relatively easier because you have a wide open sort of blue sky make some decisions situation. It gets really, really interesting if you have a, let’s say an existing DITA set of content, and now let’s say that your organization buys another company and they have Microsoft Word files and you’re going to move the Word files into DITA. Well, now you have a structure. Like, you’ve made some decisions about your content model, do you extend your structure to support what the other company did? Do you say, “Nope, you have to jam your content into the content model we created because we feel like that’s the best approach.” That starts to get really, really squeaky from a technical point of view and it’s also very political, right?

SO:                   Because you just acquired this company and they may or may not be happy about the acquisition and they may or may not be happy about reporting to you because you’re now leading this project, and so you might make some compromises that aren’t the best technical solution but that will keep the peace in these new organizations that you’re bringing together.

GK:                   Yeah, absolutely. I’ve definitely seen examples of that happen where I think a lot of the judgment calls that were made were not so much about what’s the value of the content and what’s worth keeping and what’s worth putting into a different structure and what structure are we going to use? It was more made based on those kinds of political decisions and keeping things running smoothly. And then what ends up happening down the road once you kind of get over that hurdle of a merger happening and things do settle out, then sometimes you might have an opportunity to look at things and go, “Okay, our content processes are still not as aligned as they should be and we need to start thinking about ways that we can make that happen.”

GK:                   And maybe then down the road you can start making decisions that are a little bit more logical and a little bit more based in the content itself. But yeah, when you first start to make that choice and you’ve got a situation where there’s an existing DITA structure and then there’s unstructured content coming in, that’s definitely a place where there can be a big clash over change resistance and coming into a new process that you’re unfamiliar with. So it’s really important to think about the balance of those things.

SO:                   Yeah. And I think that a lot of times, for me or for us, we’re so deep in the technology that we like to hide in the technology and look at that and say, “Well, this is just a pure technical decision.” But nothing is ever purely technical, there are always politics there and there are always considerations of, how is this going to affect the people that are working on the content if we choose a content model that looks like this and then the conversion is going to be super difficult because we chose this really complicated content model? Well, who takes on that pain and that expense of doing the conversion? Are we inflicting that on the newly arrived merged company employees? Because that will almost certainly make them cranky.

SO:                   So there are those kinds of considerations which are really interesting to me that go above and beyond the hard enough already question of, what’s the best information architecture for this content from a markup point of view?

GK:                   Definitely. So do you have any other final thoughts, final advice for how to make an information architecture development process go as smoothly as possible especially in a situation where you might be combining DITA and non-DITA content?

SO:                   Oh, sure. Yeah. I mean, is that all? I would say that, define your terms, make sure that when you talk about information architecture everybody is talking about the same thing, or you agree that, “For this meeting, we’re talking about this kind of IA,” that sort of thing. But, define your terms, and I think it’s useful and important to get the entire team up to speed on what that entire IA piece looks like from DITA markup into storage, into rendering and delivery, and whatever else might be happening downstream so that we’re not all just looking at it through our own little lens or our own little peep hole and only focused on our piece of it. If you’re a backend IA person, the better your understanding of the front-end IA, the easier it’s going to be or the better your results will be when you bring those into alignment.

GK:                   Yeah. And I think just to add to that, my advice would be that you can never do enough planning, and so… especially if you do not have major deadline pressure, which I know that’s not the case for most of us. But if you’ve got time to plan, take advantage of it and definitely do as much of that planning that you can even in and around your other deadlines and your other work before you ever start actually encoding and that will save a lot of trouble and a lot of headaches down the road.

SO:                   Yeah. That’s excellent advice, and I’m afraid hard-earned.

GK:                   I think at this point we’ll go ahead and wrap things up. So thank you so much, Sarah.

SO:                   Thank you.

GK:                   And thank you for listening to The Content Strategy Experts podcast brought to you by Scriptorium. For more information, visit or check the show notes for relevant links.