Skip to main content
August 1, 2018

Full transcript of Content strategy pitfalls podcast: migration

Bill Swallow:       Welcome to the Content Strategy Experts podcast, brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize and distribute content in an efficient way. In episode 33, we continue our occasional series on content strategy pitfalls. Our focus today is content migration. What are some common pitfalls you may encounter during content migration and how might the intrepid content strategists avoid or handle them?

Bill Swallow:       Hi, everybody. I’m Bill Swallow and I have Alan Pringle with me.

Alan Pringle:       Hey there everybody.

BS:                          And as we mentioned, we’re going to be talking about content migration today. So Alan, why don’t you give a brief description of what content migration means or what we mean when we talk about content migration.

AP:                         Well, what we’re talking about today is if you have the source files for your content–whatever kind of content it is, marketing, technical, whatever–in some particular file format, for example, a very ubiquitous file format is Microsoft Word. You are going to instead use another tool instead of Microsoft Word, for example. So you have to migrate your content from the Microsoft Word file format into your new tool’s file format, whatever it may be.

BS:                          And what are some of the reasons that someone might want to migrate their content? I mean, besides from just moving from one tool to another.

AP:                         Well, one particular reason that our clients have faced, if there is a merger or acquisition and you have two departments at what was previously two different companies. Let’s say the marketing group are using two different sets of tools to create the marketing content. It’s very likely that you’re going to consolidate into just one set of tools than having two sets of tools. So that’s a pretty common reason why you would have to do that. Another reason more from a business perspective angle: if you have done a content strategy analysis and you’re trying to increase the efficiency of how you are creating and distributing content, that may drive a decision, “we need to move to a completely different tool set.” That’s another reason why you may need to look at moving into a migration scenario.

BS:                          So looking at things like consistent look and feel, or consolidating into a single tech stack and generally having some set rules around how content is developed in the company.

AP:                         Right. Generally, it always has to do with efficiency and gains in efficiency one way or another. Your IT department for example, is not going to want to juggle a bunch of different tools that can do the same thing. And it is wasteful to do that both from a support point of view, from an IT point of view, and also internally. Why are you going to pay for all the time and effort it takes to be sure everyone’s up on multiple sets of tools when one tool will do that job for you?

BS:                          Right. And having those multiple tools I would assume also really prohibits content sharing at that point as well.

AP:                         Right. You’ve got siloing, which we have talked about a lot earlier and you probably hear a lot of that term being used, content silos. Having more and more tools does really foster, sometimes … Well sometimes siloing is a good thing, sometimes it’s not. We’re not going to get into that debate now, but in this case, tools can be a very big driver of that siloing and that’s usually negative.

BS:                          Okay. So given a decision that content needs to be migrated from point A to point B, what are some of the problems that you might encounter?

AP:                         Taking a look at the source files in your legacy tool, the one that you’re getting away from, if those files are not formatted consistently, and what I mean by that is a template is not used, there are a lot of manual overrides, a lot of manual formatting to get the look and feel instead of having that driven by template styles. When you have a scenario where it’s basically the wild west and there is not consistent formatting, it is very hard to systematically, programmatically get that content converted to your new file format. When you are looking at scripts and tools that can help with the migration, they need something to grab onto, and when you are using a template with very discrete styles, that gives the conversion something to kind of grab onto–the conversion process I should say–to grab onto. And like do just basically low level pattern matching.

AP:                         So if you have in your old template a style called “body” that is just for paragraphs and you have got something called “para” in your new tool and its styling, you can say, “I want to map all ‘body’ to ‘para’.” But if you were inconsistent about using that “body” tag and you use things called “normal” or whatever else in your previous tool, that automated process is going to have a very hard time latching onto that stuff and doing that matching that I’m talking about. So it’s really important that you do have formatting that is consistent. Also, you may have cases where you were using different templates into content that needs to be merged into that. If you’ve got different templates, you may need to have two slightly different conversion paths, or you may want to say, “You know what? We need to move all of this content into this template.” It will be more cost effective and more time efficient to get everything into one template and then convert it. So there are lots of different scenarios you need to think about, about your formatting and the use of templates in your legacy content that you’re moving over to whatever new file format it may be.

BS:                          So making sure that all of your files use consistent formatting and consistent templates. They can certainly help with the mapping there, but are there other things that might break along the way that either may or may not be handled by consistent formatting?

AP:                         Absolutely. And one thing that is often problematic when you are converting from one file format to another: cross references and links. They are the problem children, hands down. Very badly behaved children in the world of content migration. I don’t think I’ve ever worked on a conversion project or a content strategy project that had a conversion component where cross references and links were not a problem. You really have to watch for those, and if you’re working with the conversion vendor, they usually have some very specific ideas and ways to manage those. But they are trickier and you do have to really watch out for those and be sure those are maintained as you move from one file format to another.

AP:                         Another thing that can be tricky: images. Just because a file format worked when you brought them into your old tool, doesn’t mean that format is going to work well in the other tool. And when that happens, sometimes what you have pulled into your content files may not actually be the source. You may actually have a source file somewhere else. You converted it to some other format like JPEG, a GIF file, a PNG, whatever. And if you don’t have the source for that, you may have to recreate those images when you move to a new file format or a new content format that requires different types of image file importing. So that’s another thing that you have to worry about as well.

BS:                          And of course all of this basically assumes that you’re moving your content as is from one format to another. But there are certainly scenarios where sometimes you need to kind of break apart your documents and make it fit within a new tool. Perhaps you’re changing the structure of the document in some way or using some kind of chunking to create topics that have long documents, for example.

AP:                         Yeah, absolutely. Right now with all the emphasis on reuse and modularity, especially if you’re looking at longer form content, chapters in a user manual come to my mind. That information was probably put together in the previous tool as, “Okay. We’re going to do a chapter as one file in this tool.” Well, if you’re moving to a modular approach to your documentation where you’re mixing and matching smaller bits of information, what was that giant chapter file is probably going to have to be broken up into many smaller files that are then strung together in the new tool to create, you know, what was the equivalent of the chapter in the previous content workflow. So you’re absolutely right. That also may require a little bit of rewriting because you’re not gonna have the same types of transitions as you did before, and you can’t always assume that chunk A is going to be followed by chunk B, by chunk C. For example, chunk B may not be in one version of the content, so if there’s transitions between and among those things, they may not work like they used to. So it’s both the logistics of breaking things up and thinking about the flow of content and those transitions and whether or not you need to keep those transitions. You may have to rethink them or jettison them all together.

BS:                          And I would imagine that you also have issues where in a long documented format, it’s very common to have empty headings. But when you move to a topic-based environment, those empty headings really become problematic.

AP:                         Yes they do. And there are all sorts of things … logistical things like you just mentioned that worked really well in tool A, that won’t work well tool B. And you have to figure those things out and figure out how you’re going to compromise and change things in your new environment to handle those things. And a lot of times you will discover those things because your conversion process doesn’t output good content. It breaks for lack of a better word and then you’re like, “Oh, I didn’t think about that.” So be prepared to get hit with some surprises. I don’t know about you, but I have yet to work on any conversion project where there was not at least a small hiccup and it’s usually bigger than a hiccup, the problems, but you figure out a way to fix them and you move on. But you’ve got to be … You really got to give yourself time on a migration project to handle that stuff. Build in some time into your schedule, pad it to anticipate these unforeseen problems because they are going to come up. I pretty much guarantee it.

BS:                          Oh, absolutely. Especially when you’re looking at migrating from one format to another where one has significantly more robust either formatting handling or structural handling than the other one.

AP:                         It’s particularly problematic when you were moving from a very format rich desktop publishing environment, where you are specifying your formatting as you apply tags to stuff. I want a heading one here. It’s going to give me this kind of formatting that I can see on the screen as I apply it. When you’re moving to an XML or a markup language for your source content, you don’t have that formatting built in. You can say, “Yes, I want to specify this particular chunk of text is a title.”, but that title may have completely different formatting when you’re looking at it online and on a web page or if you’re looking at it as a page and a PDF file. You’re not specifying that format and you’re just applying a tag. So not only is it a great big mind shift where it’s not WYSIWYG, “what you see is what you get” on the desktop publishing side. I think the term that’s popular now to describe especially a markup language model is “what you see as one option.” What you see as you author in a markup language tool, probably is just a slight approximation of what the different kinds of output are going to look like when you transform your source content into a data sheet, a user manual, or whatever that content may end up being.

BS:                          And I would suppose also, this goes back to the first thing you mentioned with files not being formatted consistently. You can even have a solid template in place, but if you’re using a formatting rich desktop publishing tool, I’ve certainly seen cases where people follow a heading one with a heading five because it looks nicer. It achieved some visual goal for the document, but it doesn’t make any real sense when you start breaking the order of the document down.

AP:                         No, exactly. You’re talking about a situation where you have an author that is prettying things up as they write content. Well that prettying up and a lot of tools when you migrate is no longer something that the author controls. So not only is there the technical migration where you are moving something from file format A to file format B, there is also a migration in mindset. And that goes back to change management, which we have discussed a whole lot, earlier in other episodes, and that’s a whole other ball of wax. But it’s important to realize when you are migrating content, it is not just a technical issue, there is a training and change management issue that you must address on the human soft skill side of things.

BS:                          Right. Because no matter how nailed down your template is, if it allows people to use it ad hoc, then you can have a document that absolutely conforms to the template, but is structurally incorrect for the target that you’re moving to.

AP:                         Absolutely. Absolutely.

BS:                          So in those cases, I guess some of your best options are to basically look at a rewrite in some form before you even start your migration process.

AP:                         Absolutely. That has to be one of the things that you consider, is it actually going to be cheaper to just throw out the old content and start writing again from scratch, and a lot of times it is cheaper to do that. So you do have to consider that as an option. It is a painful consideration, “All these years of work!” But honestly, sometimes it’s cheaper to just ditch it and begin again.

BS:                          Very true. Especially when you have a solid starting point like that and you know how it needs to be reformed. You’re not starting over entirely from scratch, but yeah, I mean it is an effort.

AP:                         Absolutely.

BS:                          Well, I think that pretty much wraps up this episode. Alan, thank you.

Alan Pringle:       Thank you.

Bill Swallow:       And thank you for listening to the Content Strategy Experts podcast, brought to you by Scriptorium. For more information, visit or check the show notes for relevant links.