Challenges of moving from unstructured to structured content with Dipo Ajose-Coker
In episode 158 of The Content Strategy Experts Podcast, Bill Swallow and special guest Dipo Ajose-Coker discuss the challenges of moving from unstructured to structured content.
“I think we could make broad categories of challenges as tools, technology, people, and methodologies, and I think we’ll just dive into these because they’re not necessarily independent—some of them flow one into the other. One of the most complex and challenging parts is implementation. Changing over to a new tool also involves changing processes and training the staff. Basically, some documentation teams struggle with that initial learning curve.”
— Dipo Ajose-Coker
- MadCap IXIA CCMS
- Webinar: Everything is Awesome! A DITA Story: A webinar on Dipo’s experience migrating from unstructured FrameMaker to DITA XML
- Structured content: the foundation for digital transformation (podcast)
- Misconceptions about structured content (podcast)
- Unpacking structured content, DITA, and UX content with Keith Anderson
Bill Swallow: Welcome to the Content Strategy Experts Podcast, brought to you by Scriptorium. Since 1997 Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way. In this episode, we talk about the top challenges of moving from unstructured to structured content. This is part one of a two-part podcast. Hi everyone. I’m Bill Swallow, and today I have a special guest. I have Dipo Ajose-Coker from MadCap IXIA. Dipo, hi.
Dipo Ajose-Coker: Hi there, Bill. Thanks for having me on.
BS: Can you let our listeners know a little bit about yourself?
DA-C: Yeah, I’ve got a background in languages and IT. I did a bachelor’s in that and then, well, almost 20 years ago, I made the move to come over to France, and as teaching doesn’t pay that much, I thought I’d retrain and to do something that still combines languages and informing people, and I found a master’s program for technical writing and that’s how I got into that. I did my master’s and I’ve been working in medical devices, financial technology companies as a technical writer, as a technical editor. Then a couple of years ago I got that itch to change professions again. I wanted a little bit more creativity in my writing, and so I went to content marketing, and so now I’m a product marketing manager for Madcap software representing MadCap Flare, Madcap Central, and Madcap IXIA CCMS.
BS: Excellent. Today we’re going to be talking about how you might be moving from unstructured to structured content and what some of the, I guess, challenges are in that move. I guess we’ll jump right in. I’ll just ask you what is one of the key challenges that people face?
DA-C: Yeah, I think we could make broad categories of challenges as tools, technology, people and methodologies, and I think we’ll just dive into these because they’re not necessarily independent, some of them flow one into the other. One of the most complex parts, the most challenging parts is the complexity of implementation. Changing over to a new tool also involves changing processes, training the staff. Basically, some documentation teams struggle with that initial learning curve. You’ve got to learn a new markup language, you’ve got to learn a new way of writing. Then you also need additional help mostly from IT. You’re getting teams that never used to be involved in helping you put in your Framemaker or whatever it is that you’re using. You didn’t need your IT department in setting up Microsoft Word, for example, where that used to be the writing tool, setting up CCMS involves a little bit more of a lift that documentation teams might not be experienced with or be comfortable with.
BS: The implementation really checks all of the complication boxes, doesn’t it?
DA-C: Totally. You’ve got so many more people involved and you’ve got time scales and everything as well to consider.
BS: I guess let’s dig a little bit into that. You mentioned conversion, learning a new markup system. What goes into that type of an effort?
DA-C: Okay, let’s look at the first thing. Everyone goes to school learns to write English, French, whatever language it is, but then when you want to start moving to structured content, it’s usually an XML-based language, XML markup, we say. It’s not real coding, but it is still learning a new vocabulary if you want a new syntax, a new way of expressing yourself. The fact that it’s structured then means that as you do in your own language, you have a certain way of creating a sentence. You have subject, verb, object, and so on in a particular order, it gives you a particular meaning. That also applies to markup languages. Writers have to learn, in effect, a new language, a new way of expressing themselves that is valid and that the machine at the end of the day… Because we are writing for machinery, when you start writing in XML that the machine can understand, so you’re learning a new syntax, a new vocabulary as well.
BS: I guess coming from that angle in learning to essentially write in a different language, there would be some cultural and probably some workflow changes that would need to happen there.
DA-C: Absolutely. Learning that language for some people might be easy and there’s lots of courseware that’s out there that can get you into that way of writing, but it does involve classes, training entire teams, and not everyone might be open to retraining in a new way of writing. Once you have trained those writers and they’ve got up to a certain level, you can only do so much training. Afterwards, the rest comes as experience. Then another big change that your writing teams will have to make is that ownership, that question of “I own this content, this is my…” Owning the source content is something for the past, it’s cultural change that has to happen within the team in that we’re writing for a team, we’re just contributors now. We contribute to a pool of information and you have to learn a way of writing that makes it that the content that you put into the pot can be used by other people.
My style of writing things might differ from somebody else’s style of writing things. All of those have to start disappearing in the way that the writers actually create that content, and that’s a big change for a lot of people. I’ve worked in teams where during the summer holidays someone says, “Well, okay, look, if there’s any changes, I’ll make them when I come back,” and even if there’s an emergency, they’ve locked down their files, you don’t have the latest versions and so on. You’re having to wait for that person to come back. If your teams, I suppose, one of the ways that you can make the medicine go down better is to let them know that they can own the output.
You own what you put together and in structured in DITA, you have the concept of maps and book maps, so well they own that because they’re the ones that have decided which topic goes before which, and so on so forth. Then when they press that button, the PDF or the HTML output that comes out of it, they can sign their name to that. However, in the creating of the content, you must start thinking “I’m writing for a pool,” as they used to have in newspaper, poolrooms. Everyone would contribute, and then in the end you have a whole newspaper.
BS: I think that would probably go doubly for any content that certainly is going to be written for reuse so that you are absolutely writing for your team and not for just your particular need.
BS: All right, so going from old to new, let’s talk a little bit about data migration.
DA-C: Now, this part of it is, I think, one of the most complex and the longest parts of that migration from unstructured to structured. You’ve got to make decisions as to how you’re going to convert that content. Are you going to bring in an outside consultancy or are you going to do it one at a time? You’ve got to make decisions as to whether you’re going to continue updating content that is being migrated, whether to use a production and staging server, whether to wait for that pause. If you are lucky to work in a company that does not do Agile, for example, and you have big breaks in between product releases, you could say, “Okay, well we’re going to take that time to then create all the new content.” Do you also want to convert all of your content? If there’s stuff that you’re not going to be updating, this is your chance to get rid of all that stuff.
Just don’t convert it and know that whatever you find inside of your CCMS is what has a life and is able to continue living. Then you also have to consider that no matter how much help you get, whether you’re writing it yourself or getting a conversion done by a consultancy, there’s going to be some cleanup to be done because if your content was written so well in the first place in Word that you could create a matrix, mapping it directly to DITA, there was no real point moving over to DITA.
Basically, that content was good enough as is, so you are going to have to come back and go over the stuff and change strategies as you go along and think, “Okay, well, we thought we’d be able to reuse this, but actually maybe it’s best to have a branch of this or create a duplicate of that topic.” You’ve also got to think a little bit further forward as to how that content is going to be localized, it’s going to be translated, and some of your reuse decisions must also consider that part of it, as well. In that, is it something that is translatable or should we have separate topics, and so we’re able to translate them differently depending on the context and so on. I think that that shows just some of the aspects of that complexity of that data migration.
BS: Yeah, the localization angle is a big one because even if you had a perfect migration, the way that the content is now essentially tagged is going to be different than how it was tagged before. Even if the text doesn’t change, there’s still going to be some segmentation problems, so you’re not going to get that 100% match that you were looking for the first time out. It’s something that we actually caution a lot of our clients with, as well. It’s like, “Expect to take a hit on the first localization pass. You’ll get a lot of leverage, but it won’t be a hundred percent, and then from then on you’ll see a huge improvement.”
DA-C: Yeah, totally. Real-world experience, this is what we went through when I was working with a medical device manufacturer, and we planned pretty much what we thought for everything, and we had that in mind, all the advantages. Oh yeah, drop in translation costs and so on, and that was what was communicated to the engineering teams who were the ones that eventually paid for the technical publications and so on, you know the way companies work, different departments, different budgets and so on. Then we converted everything and it came to that first release and we sent them what we sent out for translation. We got that translation quote back, and it was just a little under what the initial translation was, whereas what we were doing was just an update of some of the content, and we had some explaining to do in that.
“Oh, yes, well look…” Because of the way, and as you said, segments are different, and if you look at the code for a paragraph in Word, you’d put a bold on there, and then that segment goes off into the translation memory, and it doesn’t matter whether it’s bold or not, the words, that paragraph is there as one segment. However, in XML, your bold is actually elements, B elements, before and after, and when the translation management system starts looking through it basically cuts off at that point where it encounters a new element.
It used to encounter P and then end with P/P, whatever. With this new translated migrated content, it’s going to start off with possibly a P, and then it’s going to come up in bold and then possibly another italics, and end italics and then UI control if you were doing things properly and things like that. Each of those becomes a segment, and so the translator then ends up with, “Well, it matches, but this changes,” those fuzzy matches do cost you a bit more. Think of when we had to go back to engineering and explain all of that in that further translations will cost a lot less, but this first one, you’ve got to be prepared to take that hit.
BS: Absolutely. Actually, speaking of costs, I’m sure there are others that we could mention here.
DA-C: Oh, yeah. Well, apart from training costs, which we’ve already brought in, while there’s free training, it’s never 100% free, because you are paying your staff while they’re doing that training, and so they’re not producing content, so it’s not free. You’re paying someone to do that, but you really should invest in formal training for your staff. There’s the initial setup costs, so there’s the cost of the software, there’s the cost to your IT department in putting in place all of these things. You might need to pay for someone to create the publication outputs that you need to have if you don’t have that expertise in-house.
You might need to also invest in a content delivery system because you were delivering PDFs before, but part of the whole content strategy is to have everything on a portal, on a website, and so well, there’s maybe cost that’s going to be added on to that. There’s the cost of the conversion. It’s either you’re paying a consultancy to do it or somebody in your team is going to be doing that and not working on the project that they’re normally working for, but these are all costs that will be in there. Some of them can be quite high and some of them would be just normal, one-off costs and so on. We’ve already talked about the translation.
BS: I guess let’s talk a little bit about the challenges of maintaining your consistency, because once you move to structured content, yes, structure has a series of rules. You can’t have this element before this element, and a lot of the systems enforce that for you, but what are some of the other things that you need to be careful about when it comes to consistency?
DA-C: Many teams think, many organizations think that once we’ve got this thing in there itself policing, if you want in inverted commas, you don’t need an editor, you don’t need someone to go over that because you’re overly reliant on the tools. However, you need to know that even if you have these rules in the order of elements that are allowed to be used, you might not want a particular element to appear in a particular type of content. For example, you have short descriptions of a particular type of content that you can add to your editor content, but it’s not always appropriate. Well, between user manual for product X, who is being written by Tech Writer One, and the same thing for another product within the same company, but it’s being written by a different person, one or the other might decide to include a short description, and they’re both valid.
They’re both valid topics. However, why does one have a shorter description than the other? You need that editor, you need someone who’s there to be able to take a look at that sort of thing and to help harmonize content across the different content types that you have. You would have maybe an information architect who’s there not just to help set up that order of elements and help your writers learn how to use and put them, but also who’s there to show good practice, who maybe has a session every month to just say, “Okay, well this is the best way to do this,” or “We found these examples. Could we make sure that we’re all following the guide for this type of manual, and this is the way we do it?”
Terminology is another big one in that, and you can either enforce it using a third-party tools that can plug in, or you’d have someone in there making sure that you’ve used this term. When you’re creating terminology lists, it’s not just a list of approved terms. You also should be looking at terms that are not approved.
DA-C: That must not be used.
BS: Absolutely. I would probably also mention the classic need for style, tone, and voice as well, especially now that you don’t have writers who own their manuals, “This is my manual. I wrote it from cover to cover, it has my voice, or it has my interpretation of the corporate voice in there.” But now you have a situation where you do have that reuse of individual topics in a myriad of different places, and if that style of that tone or whatever changes from one topic to the next, it’s going to be pretty jarring to someone who’s reading the whole piece.
DA-C: Yeah, a simple example is you have a writer who likes to use, “Please do this before you do that,” another writer who just goes, “Do this, do that.” If you are reading from one to the other, that can be really jarring and you might even take offense because you’re so used to the pleases and thank yous from one author, and then you get into this topic, which is actually a troubleshooting one, and you find you get this tone that they’re telling you off, whereas it was just a difference in style that should have been enforced globally.
BS: Yeah, equally jarring going from one topic to the next, active voice, passive voice, active voice, passive voice.
DA-C: Oh, yeah.
BS: Let’s see. We’ve got translation challenges, consistency challenges, some cost implications there, migration, overall cultural issues, and just the overall complexity of doing all of that work. Is there anything else we should mention here?
DA-C: Regulatory compliance.
BS: Ah, yes.
DA-C: I’ve worked in regulatory for pretty much all of my technical writing career, so that’s maybe about 14, 15 years of the 18 that I used to be a tech writer. Adhering to industry specific regulations can get very complex, and while the promise of having a CCMS with version control and being able to prove that this output was created using this version of this topic, I could get that whole list out and prove it to you. If it’s not integrated within the quality management systems of the entire enterprise, then you’ll find that certain departments will not accept that as proof. Also, the mechanisms between your source files and what you can produce with DITA, you’ve got different ways of compiling your final output, and there’s stuff that you use variables for and the stuff that you’re referencing by keys, and so it’s going to use this version as opposed to that version.
You can also push content at the point of publication, so you don’t see it in that source. However, when you do publish it, then you see this new word in there, how do you prove to the regulatory department that all that content is sane it is sound, it meets with the requirements and so on? That was another really complex thing that we had to deal with that. But by integrating the tools between each other, linking topics to requirements, for example, so you always have a requirements database, even if you’re using Jira, that’s your requirements database if you want, but if you can link those two things as a starting point, then wherever a requirement changes, for example, you know which topics are impacted. When you have to do a regression analysis, a topic impact, a change impact analysis, you’re better able to prove that to the relevant departments that, “Well, you changed this requirement. One, we’re sure that all the topics that did refer to that requirement were analyzed and we made the necessary changes, but we’re also sure that we didn’t create any fallback impacts on other topics in the entire manual.”
There’s a lot of complexity in that makes it that you really need to strategize from the start on how you’re going to respond if you’re a regulated industry, but then there’s also the part where it can help you. It’s a very interesting use case that I saw where we’re mapping DITA XML to machinery standards, and so a company that is an OEM manufacturer is able to supply the exact information required by each of the different subcontractors that we have by mapping that to the IIRDS machinery standard. That is a very interesting use case where regulatory and compliance is enhanced by being able to map those two standards and being able to push the right information based on the metadata attributes and things like that, that are tying both together. You’re easing some of the workload, the heavy lift that used to go on there.
BS: Very cool. I think this is a good place to wrap up, but we’ll be continuing this discussion in the next podcast episode. Dipo, thank you.
DA-C: Thank you very much for having me, Bill.
BS: Thank you for listening to the Content Strategy Experts Podcast, brought to you by Scriptorium. For more information, visit scriptorium.com or check the show notes for relevant links.