Tips for moving from unstructured to structured content with Dipo Ajose-Coker
In episode 159 of The Content Strategy Experts Podcast, Bill Swallow and special guest Dipo Ajose-Coker share tips for moving from unstructured to structured content.
“I mentioned it before: invest in training. It’s very important that your team knows first of all not just the tool, but also the concepts behind the tool. The concept of structured content creation, leaving ownership behind, and all of those things that we’ve referred to earlier on. You’ve got to invest in that kind of training. It’s not just a one-off, you want to keep it going. Let them attend conferences or webinars, and things like that, because those are all instructive, and those are all things that will give good practice.”
— Dipo Ajose-Coker
- Challenges of moving from unstructured to structured content with Dipo Ajose-Coker (podcast, part 1)
- MadCap IXIA CCMS
- Webinar: Everything is Awesome! A DITA Story: A webinar on Dipo’s experience migrating from unstructured FrameMaker to DITA XML
- Structured content: the foundation for digital transformation (podcast)
Bill Swallow: Welcome to The Content Strategy Experts Podcast, brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way.
This is part two of a two-part podcast. I’m Bill Swallow. In this episode, Dipo Ajose-Coker and I continue our discussion about the top challenges of moving from unstructured to structured content.
So we talked about a lot of different challenges, and I don’t want this to be some kind of a scary episode for people. Let’s talk about some tips you might have for people, as they do approach this move from unstructured content to structured content.
Dipo Ajose-Coker: Yeah. Now, I would always say the first thing is start small and then scale up. You need to take one example of each type of manual. I used to work with we had user manual, pre-installation manual, service manuals, maintenance manuals, and so on. Some of them are similar in that they’ve got similar type of content, we’re just removing parts of it. But some of them are really radically different. So we took one user manual, and one service manual, and one pre-installation manual, three major types of content. And then you convert that, test it to breaking point. And then, by the back-and-forth that you’re doing in making that the conversion matrix, so fine-tuning that conversion matrix, you’re more confident that, when you then throw the rest of the manuals in there, you’ll have a lot less cleanup. I’m never going to say that you’re going to have zero cleanup, you will always have cleanup. But you will have a lot less to do in cleanup, in manually going to look for those areas where the conversion didn’t work.
I mentioned it before, invest in training. It’s very important that your team knows, first of all not just the tool, but also the concepts behind the tool. The concept of structured content creation, leaving ownership, and all of those things that we’ve referred to earlier on. You’ve got to invest in that kind of training. It’s not just a one-off, you want to keep it going. Let them attend conferences or webinars, and things like that, because those are all instructive, and those are all things that will give good practice. And share that in between. Maybe have a train-the-trainer type of program, where there’s one person who’s your champion within the company, and who does all the conferences, and does all that. And then comes back, and resumes, and trains the rest of the staff.
Your migration must be detailed in the planning. You’re basically, “Step one, we’re going to do this. Step two, we’re going to do this.” I create phases of those because you might have to repeat a whole phase again at a different point in time. The phases, for example, verification of the content. Was what I put in what came out? When I compare my Word document and I compare the XML of it, does it match? And then, you’ll do a few things, and then you’ll publish. But you’ve got to verify again because some of those mechanisms, like I said, pushing content at publication, picking the wrong key, using the wrong DITA val would create different content. So again, you’ve got to do that verification again. You’ve got two verification phases, in that case.
BS: Yeah, I think that’s actually a really good point. Because we also see that, even when you have a smooth migration of one particular content set, once you move on to a different manual, there might be something unique about that one that suddenly, everything goes sideways when you try migrating. And you don’t have a home, or you don’t have a structure planned for a certain piece of content that you probably didn’t realize existed.
DA-C: I’d say also, you’ve got to be flexible. No matter how much planning you put into place, the plan is always 100% correct until you start executing it. And it’s at that point that you’ve got to be flexible and be able to say, “Okay, well things did not turn out right. Let’s adapt to that.” And by the end of that phase, we’ll be able to take a look back and say that, “Okay, well this went wrong at this point. Can we fine-tune it? Or is it something that we should just anticipate that it will always go wrong?” If you know that it’s always going to go wrong, you’d better able to plan for that. You know that you just need to add that step to the phase, to the next phase, in that check that this was as expected.
Look at the long-term benefits. That translation example, in that first boom, bang, “We already paid for the translation six years ago. Why do we have to pay for it again?” The long-term benefit is that, six years ago, you paid 100 grand for your translation, say. And then, every year, you were paying 20 grand because of every update. So that’s six years of 20 grand, 120, plus your 100 initial cost. Then, you switched over to DITA, where they’ve promised you your translations are only going to cost you 10 grand a year from now on. Yeah. Well, that first hit is going to still be maybe not 100 grand, but let’s say 80. People balk at that and say, “Well, you said it’s going to be 10.” No. Because for the next six years, you’re only going to be paying 10. So in the long term, it is eventually costing you less. Apply that to whatever part of it, of the scenario you want. You find long-term, it’s best.
If you look at what’s happening today, and I will only mention this once, ChatGPT and training large language models, and that. Well, training large language models on structured content has proved for efficient than just hoovering up content that does not have a semantic meaning to it, attached through the metadata. You know, attributes that you add onto that saying, “This is author information. Or this is for product X, version Y. But there’s a version X as well available.” All of that, if you look at it in the long term, those companies that have already moved to DITA are going to be better able to start quickly switching their content, repurposing it, feeding it to their large language models. Using it to train their chatbots. Their chatbots are better able to pick up micro-content.
If you look at Google today, you search for something and you get this little panel. You know, that YouTube video that tells you which section of the video answers your question. That’s micro-content. And having structured content, because you’ve got smaller, granular pieces of information, enables you to provide that sort of granularity of answers. Your users are going to be happier in the long term.
You need to, let’s say, plan for compliance. We’ve already mentioned that. Look at how you’re going to manage your terminology because that’s another aspect. How are you going to, first of all, tag it? Making that decision is your information architect. Which element are you going to use? UI control, or are people still going to be using bold italics around that? And how are you going to enforce that people don’t use that non-standard use of the correct elements?
Localization is another area that you need to … First of all, warn all your stakeholders. If there’s people that are going to be people for … Explain. Give this example that I just gave, that in the longterm your translations will end up costing less, the turnaround time will be faster, and so on. And, those issues that we used to have in that world, there was an update while it was out for translation, and then we had to pick up the PDF and highlight all those points that changed in between those two translations. That used to be such a headache for us.
BS: Those were the worst.
DA-C: Totally. And your CCMS is able to do that for you, in that it’ll send only the changed content. It can lock out content, I can lock out things that you don’t want translated.
There’s nothing worse than sending your translations out, and you know that all your UI variables have been pre-translated as string files, and what you’re doing is just importing those and that then puts the correct term inside of those tags. Well, if you send it off and then your translators then decide, “Well, no, I think that’s a better translation for that UI label that is inside,” you’re just causing a whole load of trouble that’s going to come up and catch you later. I’m speaking from experience, again. Things that will get changed during a translation, your system can lock those things out.
Another top tip is to invest in a quality translation service provider. Having a translation service provider that understands structured content is better than one who is just used to doing words translations all the time. They’re better able to understand the concept of, “Well, this topic is reused, so when I’m creating my translation, I must also translate with reuse in mind.” Looking at not breaking tags in content, not moving things around in the content, all of that training needs to be present as well on your translation service side.
And, you’ve got to leverage your technology for efficiency. Major tip there is create workflows, create templates. Templates will help your authors know that, “Well, for this topic type, these are the sorts of information types that I need to put into it. This particular topic needs a short description, and this one doesn’t.” So by picking the right template, they’re guided. They can concentrate, they can focus on creating their content.
Workflows. Oh God, workflows. That’s another big one in that review and approval workflows. What has been reviewed, what has been approved? If you’ve got content that’s already been approved, and then somebody goes and makes a change to that already approved content where it was not due for a change, that will cause problems during your audit. Because remember, you said you could prove to them that this topic was at version X, and we didn’t touch any other topics. Well, if you sent everything off, and then an SME made a change to one of the topics because they saw a mistake in there.
Well, that’s not a good enough reason, when it comes to audit. That, “I saw a mistake, so I made that.” No, you need to follow engineering change management processes, which say that for every single change … I’m talking in regulated industries. For every single change, I must have a reason for change. I saw a type in the text and I just decided to change it is not a good enough reason. If you saw that, then you must create a defect and add that to the change log that you’re submitting to say that, “We changed these. Oh, and by the way, we were trying to fix this error. But as we were going through, we saw that somebody did not put any full stops in all the sentences in this topic, so we decided to raise that as an improvement opportunity, and we added to the docket.” So we have a reason why those other topics, which were initially analyzed as those are the ones we need to change, what are these other topics that got changed? Well, we also created a ticket for that and put it in there.
So leveraging workflows will allow you to force things to go also to the right person. How many times have you forgotten to send it through to legal?
DA-C: Using the final approval workflow, make sure that okay, well the initial engineers are excluded from that because they’ve already done their workflow, but we’re sending it for that final boss-level approval, and legal can finally sign off on it. Those are the things that are parts of what your tool can do.
Your tools can also help you find out what went on where. By being able to roll back, “Well, we made this change. We thought it was an improvement, but eventually it was just a stop-gap, we’ve made a better one. Let’s roll back to before, and then create that new one that documents this.” Well, your toolset, your CCMS is able to do that for you. We used to have to do this, again talking from experience, going into the archive database, looking for one that was roundabout the date of the change that we made, picking that one out, unzipping it. And then, the whole load of trouble.
BS: I remember doing that.
DA-C: Use and leverage technology. Yeah.
BS: I remember doing that quite a bit, especially when we’d have someone from legal running down to the engineering floor and saying, “Hey, we need to find X version from X date, and see if it contains this particular sentence.”
DA-C: Yeah. Yeah.
BS: That was always fun.
DA-C: Oh, yeah. Totally.
BS: And then, needing to roll back and then reissue all the other following versions with the correct change.
DA-C: That was always a nightmare. I can remember, there was one particular incident where someone, again, had gone off on holiday. Again, ownership of documents and so on. This change had to be made. There was a stop shipment, which means there was a defect found and the regulatory body said, “You’re not allowed to sell any more until you fix this, and you make sure that it’s all done.” So connect stations, everyone. This person’s on holiday, so we go into the archives, look through, find what we thought was the right one. Only, that person that person had not checked in the real last version. So the corrections were made to the last but one version. And then, when you published it, some of the information that was supposed to be in there was not in there. But we were looking for that specific phrase, we found it. We thought, “Yeah, everything’s good.” Only by the time it goes out and gets off to the regulatory body. Then they say, “Well, what happened to all these other changes then?”
So investigation goes on, and then you’ve got to find out why. Those are all parts of the reason that pushed this organization to say, “Look, we need something that handles this a little bit better.” We had a stop-gap interim period where introduced an SVN system, but that was on a local computer, and we were able to recreate repositories on everyone’s. But that relied a lot on discipline as well. People checking in stuff. And you could always break locks. I spent so much time fiddling with the SVN system on every update. It was just a lot, too much. The CCMS was able to resolve, let’s say, 80% of all those kinds of issues. I’ll never say that a tool is of 100%, but it does help quite a lot.
BS: Yeah. Having had some SVN or GIT collisions in the past that we’ve had to unwind. Branches, upon branches, upon … Yeah. Having a system that can at least manage some level of that automatically is a godsend.
BS: Well, Dipo, thank you very much. I think this will pretty much wrap the episode. But thank you very much for joining us.
DA-C: Oh, thanks for having me.
BS: Thank you for listening to The Content Strategy Experts Podcast, brought to you by Scriptorium. For more information, visit scriptorium.com or check the show notes for relevant links.