Skip to main content
March 24, 2025

Discovering the Basics of DITA with LearningDITA (webinar)

In this webinar, Sarah O’Keefe shares the basics of DITA—what it is, why it’s crucial for creating structured content, and how it revolutionizes consistency and efficiency in documentation. By exploring core elements such as topics, maps, and metadata, along with DITA specializations like task, concept, and reference topics, you’ll learn why organizations around the globe use DITA to craft modular, reusable content and put it to work.

You’ll be introduced to a self-paced, online DITA training resource called LearningDITA. Lessons include exercises, links to additional resources and videos, and quizzes to test your knowledge.

What DITA offers is a mechanism for extensibility that doesn’t break the standard. If you’re going to try to build out a system that is futureproof, as best we can without knowing the future, then we need flexibility. We need the ability to change things as we go, to extend, to add new output types, to add new semantics, to add new metadata, to add new systems into the equation.

— Sarah O’Keefe

Resources

LinkedIn

Transcript: 

Scott Abel: Hello. If you’re here for discovering the basics of the Darwin Information Typing Architecture with LearningDITA, you are in the right place. Hello and welcome. I’m Scott Abel, and with me, I have brought our special guest presenter today, Sarah O’Keefe. Sarah, can you hear me?

SO: I hear you, and hopefully you can hear me.

SA: I can. I can hear you and see you. That’s step one toward a successful webinar today. Hey, before you share your screen and before you take off and deliver your talk and help us understand LearningDITA, I wanted to share with you the polling results thus far. So of our audience, and you can still take the poll, audience members, if you’d like. I’ll leave it open for a little bit longer. The polling question was, why are you interested in learning about DITA? So far today, the number one answer is 40% of our viewers say that they have a basic knowledge of DITA and would like to learn some more. 25% say they’re new to DITA and they’d like to understand what it is. 17% say that they’re implementing a DITA system, so this would probably be helpful information for them. And then the other 17% just says they want to advance their career. All very solid ideas or reasons for wanting to know a little bit about DITA. What are your thoughts on the results so far?

SO: All right, well, I’m surprised that we have a bunch of people that already know DITA or have some knowledge of it, and I think I’m afraid your payoff is going to be towards the end of the webinar, so drop in your specific questions, we’ll do our best to get to them. I am going to start at the very beginning, which is, as you’re probably know, a very good place to start. I want to really reset because so often what we run into is that people just assume, oh, this thing’s been around for a long time, you already know what it is. And then they just take off from there. And when I say they, I mean me, right? So what we want to do here is do a little reset and say, okay, let’s go back to the beginning and let’s talk about what this thing is and why it matters and why you might want to go down this road and give you that very sort of gentle and high level and small overview of what’s out there, and then give you a little bit of a roadmap as to how you might go and learn more. With that in mind, what is DITA, right? Where are we going to start? Scott, you already touched on this. So it stands for Darwin Information Typing Architecture, and every one of those pieces means something. Darwin has to do, as you know, with the finches and the specialization into various kinds of niches in the Galapagos Islands. Information typing is a concept in technical communication that you can label a piece of information with the type of information you’re trying to deliver. So, and now I’ve defined information type as information type, which is terrible, but how-to information, conceptual information, reference information, things that you look up. There’s other things you can do. A glossary entry is a specific kind of content, a specific way of packaging up a term and a definition. So information typing has to do with classifying your content into these various kinds of buckets. Architecture, it’s a framework. And then really what is it? It’s an XML standard for technical content. So it is a framework that allows you to think about how you’re going to organize your content and present your content and work through all of. All right, so unpacking DITA, what’s inside it? First of all, it provides what we call structured semantic content. All right? We all know what content is, but what about those other two? Structured content in the big picture means that you have templates for your content and, critically, those templates are enforceable. So if you think about your style guide where you say things like, if you have a bulleted list, you need to have at least two bullets, not just one. Or after a heading one, the next level down is a heading two, you’re not allowed to skip to heading four because you think it looks pretty. And we’ve all done it, right, including me. But in a DITA environment, those types of rules are going to be enforceable by the software. So what structured content really means is that you have a framework and you have some guidelines, and you have the ability to enforce those guidelines programmatically. The software will actually enforce them. Okay, now, semantic content. What is semantic content? Semantic content is content that has labels on it that are informative. So instead of a generalized section, you would have a topic or you would have a task, you would have a how to. You have something, instead of being labeled ordered list, it’s labeled steps. So you’re providing more information about what’s going on inside that content, which then leads us down the road of being able to, again, reach into that content with our software and do things with it. Okay? So we have structured content, which means we have predictability; we have semantic content, which means we have labels that are useful and informative. All right, with me so far? I know they can’t respond. Scott hopefully is still with me. Okay, so stepping past structured and semantic content, the other thing that DITA gives you is topic based or modular content. Now, DITA is not the only system or the only authoring tool that does this. You see this in a lot of help authoring systems that you’re sort of topic-based. But what’s a topic? What’s a module? It is a unit of content that gives you a reasonable chunk of information that’s sort of freestanding. So it is a fundamental unit of I am giving you a chunk of content. In a old-fashioned, unstructured workflow, you might think of this as a heading or a section, right? A section with a heading, some paragraphs, maybe it’s got steps, it’s got this, that, or the other thing. In DITA specifically, we do have generic topics, but we also have tasks, how-tos, we have concepts, which is what is this thing? … are very, very often alphabetical. A list of various commands that you can use. And you have glossaries, which is kind a specific form of reference content. When you take your information and you break it up into topics, then what that buys you is the ability to sort of mix and match those things. You can do a search and say, show me all the how-tos around this particular keyword because we already know what the tasks are because all the tasks are labeled as such. So here’s an example of a task. This is just a screenshot out of an authoring tool for XML. And you can see this, it’s got a heading. It’s about watching wild ducks. This is actually out of our LearningDITA content, which I’ll get to. There’s a field there for a short description, but there isn’t a short description, which is bad us; we’re supposed to do those. Then there’s a little before you begin, like what’s this thing? What are we doing? And then you have about this task, and then you’ve got some steps. There’s a table in there and oh, I think my note got cut off, but there is one. So as we sort of break this thing down and look at it, up there, you’ll see that my cursor is actually down in step one, and up top we have these little breadcrumbs that are telling me I’m inside a task, I’m inside the taskbody, inside steps step, and then CMD for command. So that’s kind of my breadcrumb location for where I am. The before you begin, those are prerequisites. Not the best example of this here. The place where you very often see prereqs is in hardware documentation where it says like, “Before you replace the battery, unplug the device,” that type of thing, or, “Assemble all your tools. You need these kinds of tools and you need this many screws and et cetera. And make sure you have all your stuff ready to go before you start the task.” So those are the prereqs. You’ve got a little bit of contextual information; ducks are great, we love them, et cetera. Or it might say, when you receive a low battery warning, typically you still have this much time before the battery goes completely dead. But once that button over there or that thing in the corner turns red, that’s when you know you’re in trouble. So not directly part of the task, but useful, contextual, additional information. And then I put in a little dotted line because the rest of this is the steps of this procedure, right? Step one, step two, steps three, a little bit of additional information, a little bit of a choose from. We’ve got some, here’s some additional information about binoculars and spotting scopes. So this is what a DITA task looks like in sort of a typical authoring tool. It’s not that far off from what you see in Microsoft Word. I mean it’s all pretty much formatted and it’s all there. Now I am going to show you what this looks like in the code view. This is the same thing; I did cut off the end of it because it got a little more verbose, but so the breadcrumbs are still coming from the UI, from the software that I’m looking at this. But see that prerequisites, it’s stashed in prereq. Then you’ve got a context and then you’ve got your steps. Now think about this for a second. Let’s say you have a hundred or a thousand or 300,000 topics, tasks. You could go in and say, give me all the prereqs or only show me steps, or I want to go read all these contexts. You can filter those things out because they’re labeled, right? And this is what we mean by semantic content. It is that those labels are not just a paragraph, like a P tag or a para, but they actually provide information about what’s going on inside the system or inside the text itself. So the prereq there does have a P tag inside it. So that’s just a standard paragraph. Because it lives inside the prereq tags, we know that’s the prerequisite. So when we talk about semantic content, this is what we’re talking about. The idea that the label that is on that content is not just a formatting label, like a P tag or an ordered list or a bullet or something like that, but rather information that gives you additional knowledge about what’s going on with that content and then makes it machine processable. So I did put these side by side so that you can kind of take a look and you can see how those things kind of map from one to the other. All right, Scott, how are we doing? Any questions so far?

SA: I was on mute, so let me see. Yes, we have one. “I’ve heard managers question whether content should be rewritten specifically for AI. Is DITA sufficient for AI?”

SO: Oh, well that’s a simple yes or no question.

SA: Yeah, there we go.

SO: I mean, they asked a yes or no question. Let me answer that with a not simple, not yes or no. In general, AI is going to perform better if you hand it content that is consistent, that is structured, and that is semantic. All the things we just talked about. And if you’re authoring in a DITA environment, you are going to have content that is consistent, or more consistent we’ll say, because there’s some behavior, it is possible to write really bad DITA. But DITA gives you a framework that provides for consistency, structure, and semantics, which will then make the AI potentially happier. So is it sufficient? Maybe, maybe not. But if you do a good job in your DITA implementation and organize things properly and put in your metadata properly, which I’ll get to in a second, then yes, that will be AI-friendly and it will help you do the things you need to do downstream with the AI. So I hope that answers that question. It’s sort of a qualified yes. It’s not the magic bullet, but if you do it well, then yes. All right, so map files. I want to talk a little bit about map files. So we’re talking about topics, right? So I have all these how to, how to, how to, what is, where is, how is. Great. But I still need to deliver sort of an experience. You can’t necessarily just throw a bunch of disconnected topics out to the world and call it a day. Now in some environments you do, you put them all in sort of a content puddle and then people search and they get the information they need or to the question’s point, they use some sort of a chatbot that’s running AI to reach into the content puddle and get out specifically what I need. But very often I also still need to deliver either a document like a PDF or even print and/or a help system. So some sort of navigation and some sort of helpful context of where I am in the system. And for that you need map files. So a map file is going to let you take all of your topics and turn them into a sequence with a hierarchy and put them all together and say, okay, my book about ducks has all of these topics in this order and in this hierarchy. So If you think about the map file as being basically the table of contents of a book, you’ve got the top level, you’ve got your preface, front matter, whatever, and then here I have wild ducks. That’s going to be some sort of a main chapter title. And then types of wild ducks, wild duck species, and watching wild ducks are going to be subordinate headings inside that chapter, in a book metaphor. If we’re in more of an online help, like a tripane help interface, then this would most likely turn into a left panel navigation that you can click to navigate to each of those topics. And then probably we’d have also search and some other things that you could do with that. So the map file is going to give you sequencing, what order these things belong in, the hierarchy of what things are children of other things, what gets a heading one versus a heading two versus a heading three, and it gives you that sort of collection of these are all the topics that go together to talk about this particular subject matter or product. And so looking at this, you can see I’ve highlighted watching wild ducks, which was my sample topic from earlier. It’s very, very common in technical documentation that you need to reuse topics, right? You have something like this and it’s being used in my book about ducks, but also in my book about just bird watching in general, and also in my book about the binoculars that I want you to buy. They might each have the same chapter about watching wild ducks, or sorry, this topic about watching wild ducks, therefore we can put it into multiple map files. So a single topic can be in lots of places and get reused, and that’s one of the ways in DITA that you can efficiently reuse information and leverage it so that you don’t make copies and write it over and over again. I can say lots more things about map files. There’s a ton of stuff you can do with them, but just think of it as a table of contents and off we go. I wanted to touch briefly on metadata and somebody asking about AI, this is actually a really, really critical concept. So metadata is additional information about the topics, and you can get very sophisticated with metadata, and this is where you start hearing people talk about taxonomy and ontology and other scary things, but at a high level, you’re going to have three types of metadata typically. You’re going to have administrative classification and filtering. Administrative metadata is stuff like, I wrote this topic and it was last updated on this date, and it’s in a review status of some sort. In most cases, if you’re in a content management system, the administrative metadata will be handled for you. If I open a file and make some changes to it, the system will keep track of the fact that I made those changes so it sticks my name on it, that type of thing. You have classification metadata. So this is along the lines of this topic belongs to this product or it belongs in this category of information. So it’s ways it would, if you think of a faceted search, right, so I’m on the front end, all these topics have been put online and I personally am the end user and I’m trying to search to find a specific piece of information. You know how you can, if you’re searching for shoes, right? You put in a shoe size, you put in a shoe width, very important if you’re me, you put in heel height, also very important, and you put in maybe the shoe type or even a brand, and it filters from 10,000 pairs of shoes down to more like 200, and then we scroll through those and have some fun shopping. But in a documentation metaphor, you do essentially the same thing with the classification, right? I want to see the how-tos, I want to see this version, I want to see the topics updated in the last month, show me those things, and it will filter it down for you and give you a reasonable list of results. Filtering is similar, but filtering is usually an authoring process. Let’s say I have a birdwatching topic and I want to include some information in the binocular version versus the getting started with birdwatching or the all about wild ducks thing that is different. So I have an extra paragraph that I want to put in one place but not the other. As an author, I have the ability to apply metadata to a paragraph in order to filter it in or out of various things. So I have a topic with let’s say four paragraphs in it, but when I push it to one output, it only gets three paragraphs because that fourth paragraph is unique to the other deliverable. That is, well, you hear it called conditional text, sometimes it’s called profiling or filtering. Those are all the same thing, and there’s extensive support for it in DITA in the metadata. And that is true at the topic level, at the paragraph, the block level, and also down at the character level, even phrases within a sentence. I would strongly encourage you to not do conditionals at the phrase level, that leads to tears when you go to translate your content because of grammar issues. But start at the top with the topics, work your way down to the blocks, the paragraphs or the paragraph ranges, and then maybe consider whether you really need to go down to sentences. Probably you don’t, at least not this week. Okay, so metadata, right? And what does this look like? Well, here’s just an example of creating a map and it has some author information in it and some dates; it was created on this date, it was revised on this date. Again, this is in the XML authoring interface, which looks fine. If you take this over to the code view, then you’ll see that we’ve actually embedded a bunch more information. If you look down at the bottom where it says dates, those are the critical dates for this document, so 2016/03/07 was the creation date, and then there’s a revised modified date. And then up top, you see we have these names of people that authored some of this content, but also there’s a link in there. And so the link points back to our website because these were some of the people on the Scriptorium team that authored this content. And you’ll see a scope external in a format HTML, which basically says this link points to an HTML website that is potentially far away. So that’s a pretty good example of embedding additional metadata onto the content itself, because visually I just see the name, but then when you look inside the metadata, inside the tags, you’re seeing additional attributes and additional content in there. All right, so switching gears a little bit, and I want to talk a little bit about the business case for DITA and why it matters. And this goes right back to that first question we got. The bottom line, baseline reasons for considering DITA in your world, in your content world, are these four. It is machine readable, it’s automation friendly, and therefore AI friendly, it is semantic, right? It has labels on it that mean something, and it’s extensible. And of these four, I sort of think the two in the middle. I mean, automation is almost like a prereq, right? But you have a lot of things, a lot of different systems out there that can be automated and can be machine readable. Semantic, useful labels, and extensible. Like we can start with the core DITA, but then we can go from there out into more and better stuff, that is really, really, really important because when you start building extensions, if you start with a core, whatever you build, and then you say, “Oh, I have this feature I need and it’s not there, so I have to customize.” And when you start customizing, what happens in general is that you break off of wherever you started and you build a custom version, and now you have to maintain the custom version forever because that is now yours or your company’s. What DITA offers is a mechanism for extensibility that doesn’t break the standard. So when I look at these four things, this is what I’m really trying to tell you. If you’re going to try to build out a system that is futureproof, as best we can without knowing the future, then we need flexibility. We need the ability to change things as we go, to extend, to add new output types, to add new semantics, to add new metadata, to add new systems into the equation. People come to me and they say, “Well, okay, I’m going to put everything in a DITA, CCMS, great, but oh, I need to connect it to,” and then they say words like Salesforce or SAP or a product information management, a PIM or a product lifecycle management, PLM system. And weekly, somebody says, “Have you ever connected it to X?” where X is something I have never heard of and have to Google. So the most common ones I get are, “What about SAP? What about Salesforce? What about this? What about that?” Great, we see those a lot. But they’ll say, “Oh, we have this system,” and then it’s a custom homegrown internal thing, nobody’s ever heard of it, but, “Oh, we built this thing instead of buying fill in the blank common thing. Can we connect to it?” Well, I mean we can, probably, assuming there’s some sort of a connector interface either going in or coming out, but we need that flexibility. And what DITA in general and XML give you is the ability to interoperate with all of these systems because we’re not bound into a particular technology stack. And so in DITA specifically, we have something called specialization. Now specialization is its own webinar. We’re not going to spend a lot of time on this, but what you need to know is that in DITA you can create additional tags. If the tag set that is there does not make you happy or does not meet your needs or does not provide you with the metadata values that you need, you can extend, you can add new tags, you can add new metadata, you can change the values of the metadata. And when you do that, if you use the specialization mechanism, then your customized DITA, your specialized DITA is still valid DITA and it will work in DITA-based tools that understand specialization, which is or should be all of them, right? If it doesn’t understand specialization, it’s not really a full DITA tool. So you can say, “Here’s the DITA standard, it’s not quite right for me, so I’m going to modify it.” And that’s specialization. The other thing you can do is you can look at the tags that are in there and say, “You know what? This is too many tags. I only need a subset of them,” and you can exclude tags, which is called constraining. So you can create constraints and just throw out all the tags that are not relevant to you. And that’s how you get from this sort of big scary standard with a ton of stuff to something that you can adapt to your requirements and still have it be valid in the DITA ecosystem. All right, so having dismissed specialization in two minutes, which is a horrifying, horrifying thing to do, I want to talk a little bit about the DITA ecosystem and what this looks like generally. You can, of course, author DITA in … I mean you can author in a text editor, just a plain vanilla Notepad kind of thing. Not a lot of fun, but can be done. You can use an XML editor, and then there are numerous flavors of XML editors. Some of them are connected into a content management system, which we’ll get to. Some of them are kind of standalone. Lots of options there. So you have this authoring layer where you’re creating content, you’re editing in maybe more of that, not WYSIWYG, right? It’s not what you see is what you get. We like to call it WYSIWOO, what you see is one option. So you have an authoring interface, it’s reasonably approximating a word processor, or you could be hardcore and go into a text editor or you could take something that’s even more stripped down and maybe even forms based. So that’s the authoring layer. You then have a storage layer. Now that could be your file system, you can go BareMetal and work on the file system, but usually what you have is a component content management system, a CCMS. And as Scott said, Heretto, who’s sponsoring this particular series, is one of those CCMSes. So you have storage, and what it allows you to do is stash all those topics we were talking about as individual bits, chunks, in the system. You can actually store even smaller chunks than that if you need to. But the storage mechanism is to keep track of all these many, many topics that you’re creating and then the collections, the map files that go with that. The other thing you’re probably going to see in the storage layer is a translation management system. If you’re doing translation, you probably, you or your translation vendor, probably has a translation management system and they stash some things in there. So we have storage, and like I said, one or maybe many authoring tools that connect into whatever your storage approach is. And then we’ve got delivery down on the bottom. So delivery is your output. I’ve put content delivery portals here, web servers, web CMSs, Salesforce, Zendesk, PDF. We keep trying to get rid of PDF, and we keep trying to not be allowed to get rid of PDF because it’s useful. The idea though is you store all your topics up there and then when you deliver them out, you push a button and you render the thing that you need. You don’t spend your time formatting; all of that gets automated away. All right, so that brings us then to the obvious question, which is, do you actually need DITA? I’m quite unclear as to whether DITA is represented by the peanut or the blue jay or maybe something else entirely, but cute picture, so I went with it. All right, do you need DITA? Well, I don’t know. Let’s talk about what it buys you potentially. These are the six most common things that we work through when we’re talking about DITA. So do you need structure? Do you need that enforcement, that a topic needs to be organized a certain way, and I don’t want my authors to just sort of go organize … Every one of us is special and every one of us has our own way of organizing the content and we’re just not going to be very consistent. Do you need structure? Do you need semantics? Do you need those useful labels that say, I’m a note or I am the prerequisites for this particular topic? Is that something that helps you in your authoring environment or in your content operations, in your content ecosystem? Scalability is a big one. If you are producing a lot of content and especially a lot of content across a lot of languages, the more you have and the more complex it is, the more likely it is that you’re going to look at DITA as a solution. If you have 20 writers and you’re going into 20 languages, almost certainly scalability is going to be your top concern, and almost certainly you can justify going into a DITA system. If you have one or two authors and one language, you might still be able to justify it, but you’re not going to have that huge scalability issue at a smaller scale. We talk about velocity. How fast does your content need to get out the door? Can you afford to stop and format it and reformat it and re reformat it and, “Oh, my auto numbering isn’t working. What do I do?” If you want the ability to push a button and get your PDF output, push a button and get your HTML, push a button and push the content into a Salesforce or something like that, that’s velocity. And if you need velocity, the more velocity you need, the more you need automation. And for that, you need a framework such as DITA, but not exclusively, to make that happen for you. Versioning. This is a big one. What we’re talking about here is the idea that you have content, you have a bunch of different topics, and they overlap, right? You have, let’s say you produce software and you have some sort of a introduction to our product or even what is a relational database and what’s the difference between a relational database and a knowledge graph? So you have these sort of core concepts that you need to communicate to your end users and you put them in every product or in every set of product documentation. If that’s the case, then you want to reuse and you want a version across all of those different deliverables. And when you’re doing that, you need versioning and you need version control. So I’m talking here about filtering and variance and conditionals. There’s also the issue of versioning in the sense of this client over here has released 11 of our software and this client over here has released 12 of our software and we’re branching. So now I’m talking about actually like a source control kind of versioning. And we need to maintain two or more separate versions of the documentation live with huge amounts of overlap. The more of that you have, the more likely it is that you need something complicated along the general lines of DITA. And then finally, do you have a business case? Because we can talk about all these other pieces, but is it worth the investment? Is it worth converting all your content from wherever you live now over into a new system like this? It’s a significant uplift and investment. It takes time, it takes money, it takes learning. So is that a sensible thing to do? So the answer to do you need DITA, is evaluate these six things and see where you land. We can help you do that, we’ve got some calculators on our website, but the broad answer is the bigger and the more complex your environment is, the more likely it is that this will help you. So if you have five or 10 or 15 or 20 writers and you’re in something like InDesign and you’re struggling with technical documentation, you can’t get it out the door fast enough, and localization translation takes too long, you need to take a strong look at this. Not saying it’s the answer for you, but in my experience, something to explore. All right, so before I cut over to talking about LearningDITA.com for a few minutes, Scott, anything else that we need to address before I cut over to the how do I learn this thing part?

SA:  Nope, you’re right on the right path. Go ahead.

SO: Alrighty. So we have a LearningDITA.com site and it has a bunch of DITA classes in it. It’s been around for 10 years; we just rolled out an updated version of it, which by the way is why this webinar was delayed. Intro to DITA is out there, it’s free, it covers some of what I’ve covered here, but I would say it goes more in depth into why does this matter and why do you care and what are some of the fundamental concepts? And then there are eight additional courses there. They’re all self-paced and they’re $15 a piece so you can get all of them for less than a hundred bucks. And here’s the list. So I want to zoom through all of this, give you a coupon, and then we’ll take some questions. So this is the list of what’s out there. And you’ll see it goes from very basic, like what’s a concept and how do I author, all the way to the learning and training specialization, which is an additional add-on to DITA that allows you to author training content and e-learning classroom training, that kind of thing. On our roadmap, additional to these nine courses, is DITA 2.0. We have done a bunch of the work for those courses that is coming sometime around the time DITA 2.0 gets released. Don’t ask me when that is. I don’t know. We’re looking at doing some more advanced courses. We are considering doing live instructor-led classes as opposed to self-paced. And we would really like to hear from you. So I’ve got some contact information and I think it’s in the attachments as well. What are the courses that you need? What’s the stuff that you really need to do? I heard earlier this week from somebody who said, “I need Intro to the DITA Open Toolkit for Developers.” It’s like, “I don’t need how to do scary things in the Open Toolkit. I just need to understand the framework. My developers can figure out the rest.” That’s an interesting one and we’re definitely looking at it. All right, here’s the payoff. There is a coupon code. It is valid until apparently June 20th and it will give you 25% off the nine course bundle, which I think lands you at about $75 for the whole thing. So, something to consider. I’ll give you a second to capture that before I jump over to the slide with my contact information. And Scott, with that, I’m going to throw it back to you and I am going to leave up some email addresses for a few minutes and then I’m going to turn it off so that I can see you. 

SA: So I’ve got some questions for you. So the first question is, what’s the difference between a DITA compatible CCMS and a standalone XML editor?

SO: What is the difference between a DITA compatible CCMS and a standalone XML editor? Okay. A standalone XML editor is an authoring tool, like a Microsoft Word that’s sitting on your computer on your desktop. Well, actually, I guess it could be a web editor like Google Docs. But you type your stuff in there and you save your file and that’s it. A DITA CCMS, a component content management system, is a repository or a storage layer that allows you to stash all your content. Now how is that different from putting a folder on my desktop or a folder on my local hard drive? The content management system allows you to typically control those files. So if I’m working in a file, it will track all the changes that I made and I can roll back versions and do those kinds of things. I can lock a file so that when I’m working on it, you can’t get to it. So it’s for sharing. And, and this is maybe the critical … Oh, it allows you to embed all the publishing infrastructure; instead of having it locally on my hard drive, it would be in a server. So I work there and everybody’s sharing the same infrastructure. And then finally, the maybe most important piece is that if I’m in a CCMS, I can look at a topic and I can say, where is this topic used? Who else is using this topic? Where else can I find this topic? Now, I can do some of that with file search and file names, but a CCMS goes much beyond what you can do sort of at the file level and gives you that better control over your content and your topics. And if you have more than a few writers, then it becomes very important to avoid file collisions. If you’re familiar with software source control, a component content management system, you can think of it as being source control, and in fact, you can get pretty far with source control, but tuned for content and content requirements instead of being tuned for source code. So the CCMS is the layer that’s lets you store and control the information and then the XML editor is the authoring tool.

SA: That makes perfect sense, and I think that answers that question. So just the second question here is I’ve heard that savings from localization and translation can be used as a way to argue for a DITA CCMS implementation. Is it true that companies can save a lot of money on translation and localization so that they might be able to recoup some of the investment from moving to DITA?

SO: Yes, and that’s one of the most common justifications for moving into DITA from let’s say a desktop publishing environment of some sort. Very rough numbers, right? You can usually get better numbers from your localization team and your localization vendor, but very, very roughly, for every $100,000 that you spend on translation localization, 30 to 50% is going to be formatting and reformatting and the rest is linguistic, like actual translating the words into the other languages. And the other piece is, “Oh, it’s in German and it’s twice as wide now and my tables look terrible,” and somebody has to go in and reformat them. So when you look at DITA and the business case, localization is a great place to start because the more localization you have, the more likely it is that you have formatting cost in there because of your desktop publishing tools, and if so, you can use that to leverage or to … That all gets automated away because all of the formatting is going to be automated and therefore you can squeeze a lot of that out of your localization process. And that’s before you touch on the question of, oh, but if I’m better at reuse, then I’ll have less content to translate, right? Because if a topic is reused, we translate it once and then it will propagate to all the places where that topic is being used, which means I translate one times 200 words, not multiple times. Now if you have translation management, you can address some of that, but what tends to happen is when you make copies, small differences creep in, which is a quality problem, but also increases the cost of localization.

SA: I’m going to customize this question for my intent, which is to make it clearer for everybody on the audience. So how about can we make DITA work with GitHub and do you know anything about that? And if so, what would the scenario look like if somebody were trying to do that?

SO: Yes, GitHub is a source control system, right? It lives on the web or on the internet, and you can stash DITA files in GitHub and use it to manage those files under source control. There’s some limitations in terms of what you can do with source control versus content management, but again, GitHub’s very attractive price point of free. If that’s something you’re interested in, I would encourage you to take a look actually at the LearningDITA project on GitHub, which is the open source content that is the foundation of the LearningDITA.com site. It’s all written in DITA, and you can kind of see what it looks like to have DITA files stashed in there. But the short answer is yes, you can do it. It is not specifically tuned for content and for XML, but yes, it will work.

SA: I’m sorry, I’m going to switch the camera over to me for just a second, me and you, and I had heard also that in shops that have DITA OT errors or build issues, they might have to do additional debugging steps because GitHub actions don’t provide some kind of a native understanding of DITA specific content. It wasn’t built to understand DITA, so why you could probably use it to put stuff in there. It doesn’t have the awareness that a tool built for DITA would have. Is that probably a fair thing to say? I don’t know.

SO: I mean, you’re going to do the work somewhere along the way, and for me … So to clarify, the vast majority of the work that we do is CCMS based. We do have some DITA running either BareMetal or in GitHub kinds of things. Yes, it can be done. Is the tool optimized for it? No. And as for the rest of that, Scott, I’m going to refer you to my development team because you lost me somewhere around, DITA Open Toolkit, scary, scary things.

SA: I know. I think when all those software tools get into the mix too, we’re also reliant on our information about the software that we have from our knowledge, and that might’ve been a year or two ago, and everything changes so quickly, I’m afraid to talk about tool specific things except for the categories. One thing is built for one purpose and then they may retrofit it to do something else, but that doesn’t always make it a really great solution for people. So I’m always hesitant to recommend tool-specific solutions without knowing more. Here’s another question that I thought was pretty good. This is in a shop that’s a CI/CD shop and they want to know if the DITA Open Toolkit can be configured to run locally or in a CI/CD pipeline if you want to do automated builds?

SO: Okay, so CI/CD stands for continuous integration, continuous development, delivery. I don’t know, I can never remember. It’s basically a pipeline where if you think about one extreme being we make a bunch of updates and every six months we release a thing, CI/CD is the opposite of that. It’s we’re making these little tiny fractional updates and then we update every day or every week or every hour. And now I’ve forgotten the original question. Sorry. Oh, can we put a DITA Open Toolkit into CI/CD? Yes. Yes, we can.

SA: Perfect. And then if our viewer defines a specialization, can they share that specialization with others potentially even outside of the group that they’re working with?

SO: Yes. Short answer, yes. Slightly longer answer, the DITA comes with a set of structured definitions, which are called document type definitions or DTD. The DTDs are the things that say, hey, a topic, with little angle brackets, has these kinds of tags in it. When you specialize, you basically extend the DTDs using a very specific, there’s a methodology for that. You don’t just go in there and hack the DTDs; that’s bad, don’t do that. So you extend using the approved specialization mechanism. That gives you this nice tidy plugin package and that you can share with your coworkers, with your downstream customers, with whomever. This is probably also a good time to mention, because I forgot, that DITA and the frameworks, the DITA Open Toolkit are under an Apache open source license, which means that you can extend and do things and build a new thing on top of it that you then assert ownership over and commercialize. You obviously can’t own the DITA spec itself, but you absolutely can claim the stuff that you build on top of it.

SA: Excellent. Which begs another question. “Is there a resource that addresses edge cases for topic types without extending them? For example, a topic that has three small procedures that are part of a process, each of which is introduced with a mini concept section. Breaking this into three concept topics and three task topics seems too granular. Is there another approach?”

SO: Yeah, so that’s really an information architecture question, and the short answer is that there are not a lot of IA resources out there in general and let alone information architecture for DITA specifically. So I’m not aware of anything. That is actually on our roadmap of things that we’re interested in adding classes for, is this sort of how do you specialize, how do you do DITA-based information architecture? To the person that left that question, I would be really interested in finding out more about your edge cases. I would also, I think look at what’s coming in DITA 2.0 because there may be some things that you can do there more easily than in DITA 1.3 for that specific issue that you’re describing.

SA: Thank you. One of our viewers is surprised to learn that Adobe FrameMaker supports DITA, which is surprising to me, because I think FrameMaker has been supporting DITA, and XML actually, since FrameMaker 6.0, like 2000, 1999 or something, when it was an SGML tool, which is a related language to XML. The question is, “Can you comment more on FrameMaker and DITA? What do you know about that today?”

SO: Okay. So it actually goes farther back than you think because before we had structured FrameMaker … So, sorry, FrameMaker has two versions. There’s unstructured, which is the desktop publishing tool, templates, whatever. And there’s structured FrameMaker, which is the XML and DITA enabled version. Structured FrameMaker, back in the dark ages, was called FrameBuilder. It was called FrameMaker plus SGML. And as Scott said, SGML is the precursor technology to XML. Please don’t ask me when XML came out, something like 1996. And SGML is well before that. So that was out there. Not widely used except by nerds like me. FrameMaker uniquely has the ability to, you can embed structure into it, so you can do all this DITA stuff that we’ve been talking about including specialization. And when you’re authoring, it gives you what amounts to a preview of the print of the PDF version. So it is possible. The primary issue that I see with FrameMaker today is that 20 years ago our primary deliverable was in fact PDF. Today for most people it’s something online, it’s more like HTML. And so when you sort of get bound into that page metaphor that FrameMaker gives you really, really well, there’s a bit of a disconnect between that and prioritizing the more online stuff. But based on whatever your use case is and what you’re looking for, that might be the right answer for you.

SA: Hey, I was the Adobe FrameMaker fanboy for years, but it was also one of the closest tools that mimicked the desktop publishing environment. Because as you mentioned, the earlier incarnations of that tool were about desktop publishing, and so we were able to just layer on this SGML and then once we started to do that, we realized, wow, we have to componentize our content, and along came DITA as a topic level presentation method. I really appreciate your going deep dive on this beginning stuff. I think it was really helpful for people. There’s one more question that was asked that I think we can slide in here, which is, “Do you know of any tools, AI or otherwise, that might be able to convert DOCSIS code to DITA?” And I guess the question would be, do you want to do that or is that even the right approach? If you want to make both of those things work together, are there pitfalls to doing that?

SO: Yeah. Heavy sigh.

SA: We should have a webinar on that, I’m afraid, but-

SO: Heavy, heavy sigh. Aren’t we out of time? No. Okay. It was a good try. So yes, we have done quite a lot of this and it is somewhere between terrible and awful and no good. The problem that you’ve run into is that if your DOCSIS code implementation, whatever it may be, is very highly structured and consistent, it’ll work pretty well. The problem is nearly everybody uses markdown or its various flavors in order to get out of the structure and the enforcement. So you get these bizarre edge cases and things break along the edges. With that said, a lot of these tools can actually have DITA and markdown exist side by side. And so we’ve seen a lot of workflows like that where you have some of the topics that are more conceptual and more backgroundy and whatever, they exist in a DITA puddle. And then you have the more DOCSIS code, the code reference existing in a markdown puddle, and then you find a way to combine them on publication if you need to. But what I would say is that that approach, sorry, the markdown to DITA conversion, is more painful than you can imagine. There are a bunch of tools out there that’ll do it, but at varying degrees of fidelity and you need that fidelity and you don’t get it. And so I would describe it more as the entire conversion is a pitfall rather than where are the pitfalls, right? I’ve done it; it’s very unpleasant. Do not recommend. Oh, and whatever you do, do not round trip it. Right? Markdown to DITA and back out to markdown [inaudible 00:59:10] or the other way. Do not do it. That is a bad idea.

SA: We’re going to have another show about that. I also think we need to talk about those other topics, which we’ll do another day, which is the difference between difference between transforming your content, converting your content, and migrating your content. Because those three words often get bandied about as synonyms, which they are not. Thank you very much, audience members. Please give us a rating on the quality of the information Sarah’s delivered to you today using our one through five star rating system that’s located just beneath your webinar viewing panel. It’s a quick thing, you just click the buttons for the stars that you think we deserve. One is a low rating, five is exceptional. There’s a little field to which you can type some text-based feedback and we’d appreciate it. Thanks to Heretto, the AI enabled CCMS platform that helps companies around the globe deploy developer and documentation portals that delight customers. You could learn more about their tools at heretto.com. And thanks to Sarah O’Keefe for bringing us today this great webinar, discovering the basics of DITA with LearningDITA. Don’t forget that you can check out the LearningDITA website and get the basic class for free and sign up for those others with the discount code, which I believe is DISCOVERDITA. That information will be available on the LearningDITA website. You can check that out, Google it, take yourself right over there and learn some DITA today. Thank you, Sarah, for joining us. We really appreciate it.

SO: Thanks, Scott.

SA: Okay, until next time, be safe. Be well. Keep doing great work. We’ll see you on another webinar from the Content Wrangler in the near future. Thanks, everybody.