How to choose a content model with guest Patrick Bosek (podcast)
In episode 150 of The Content Strategy Experts Podcast, Alan Pringle and special guest, Patrick Bosek of Heretto talk about choosing a content model, factors to consider, and when you should think about customization.
“There’s a valid use case for almost every approach that’s out there. There’s no way around that. I think what it really starts to come down to is making sure that you’re matching the 18+ months [ahead] to the decision you’re making now.”
— Patrick Bosek
- Demystifying content modeling
- Is Content as a Service right for you?
- LearningDITA — free, self-paced DITA training
Alan Pringle: Welcome to The Content Strategy Experts Podcast, brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way. In this episode, we talk about choosing a content model and the pros and cons of customizing it with Patrick Bosek of Heretto. Hey everybody, I am Alan Pringle, and today we have a special guest. It’s Patrick Bosek of Heretto. How are you, Patrick?
Patrick Bosek: I’m good. How are you, Alan?
AP: Good. Let’s talk a little bit about content models today, and let’s start really at the beginning, choosing one, do you pick a standard? Do you create your own? What do you do?
PB: I guess if you’re looking for my advice, which I suppose you are since I’m on this podcast.
PB: It’s obviously use-case specific, everybody goes into the process of creating a content model and the first step they look at is like, “What do we need this thing to do?” Well, not everybody, let me back up. People who make good decisions when choosing a content model start with deciding what they need it to do in the long run. And I think the thing that I’ve seen here over the what? 15 plus years that I’ve been working in this industry, is really that there are organizations which somebody chooses a content model, because they think it’s cool and it works for what they’re doing right now. And then there are organizations that look at the total scope of what they’re going to need to do both in their group and then probably beyond their group for the organization long run. And then they embark on a much more formalized process for choosing a content model.
Very often you’ll see that where you start, how you start is pretty impactful on what you choose. It’s not real common that the very iterative approach, it’s almost an Agile approach in a lot of ways, like, “Oh, this works for this, let’s move it forward and we’ll do it this way, this way, this way.” And it’s very iterative. Does that lead people or land people on structured content? Typically, that lands people on either proprietary tool sets that are built on whatever’s in their proprietary tool set. There’s a lot of wikis out there that have their own set of structure in the background, it could be either text-based or it could be HTML based. MadCap’s got their own proprietary standard. There’s a few other tools that are built on standards which are standards but proprietary.
But I think those are even probably more on the side of things that are intentionally selected. Very rarely do you see an organization that takes this iterative path go into it and then choose something which is structured, because that stuff is typically less available to just put your hands on and just start creating something that you can then spit out a PDF or spit out a website. I think that knowing how it is that you’re going to start and knowing what set of problems and what the time horizon you’re trying to cover are, that’s like getting ready to get ready to choose your standard if choosing your standard is your first step. And I think that knowing where you are there is really critical, how is it that we’re making this choice?
AP: No. And I think you’re right. This is like any kind of business decision, you’ve got to think about what your requirements are and it is not a situation where you were looking at or I would prefer that people not look at the next three to six months. Yes, you may be on fire and have something to do, but you’ve got to be careful and balance things out and think, “Are we going to go with something that’s much narrow and very focused on one use case? Are we going to go a little wider and accommodate things that might be say 18 months, two years down the road?” For example.
PB: Yeah. I think that’s fair. I think the thing I would add to that is that, there’s a valid use case for almost every approach that’s out there. There’s no way around that. I think what it really starts to come down to is making sure that you’re matching the 18 plus months to the decision you’re making. And so I’ll give you a really good example, I think it’s a really good example anyways, so if you’re just writing a README file for a microservice that you’re setting up and it’s going to be maintained with a code and nobody’s really going to be referencing it, who doesn’t actually have their hands on the code, using true structured content for that makes no sense.
AP: It’s overkill. 100%, yes.
PB: Yeah. And it doesn’t really integrate well with the delivery or with the end user. It would be a bad experience all around. That makes a ton of sense to just use what is supported by the repository that you’re putting that into, which is by and large, it’s either GitHub or Bitbucket or the other one, GitLab, so it’s one of those three, but they all support some Markdown. README files, by and large, if you’re choosing something other than Markdown, you should have a really special case. On the other end of the spectrum though, if you’re thinking about, “Okay, this is going to result in a large set of content, which is intertwined and pieces of it and move at different speeds throughout time and different audiences access it in different ways and potentially get different pieces of content based on who they are,” at that point in time, you can’t do that with Markdown without doing a lot of custom stuff.
AP: I was going to, say there’s some people who might tell you that you can do that, but I’m pretty sure you should not. How’s that?
PB: Okay, fair. That’s an important distinction. You shouldn’t do that with Markdown. This is a tangent, but I love tangents. I was looking through the software documentation, I can’t remember which company it was, it might’ve actually been one of the Git providers, and they’re still in Markdown with a lot of their content. But they’ve gone so far with pieces of this to customize it for this platform or this audience or that, they don’t have tags in there, but they have things that are tags in there. I think they’re square brackets with a percent sign or something, and then a name after it, and then an end one too. And I’m like, “These are just tags.”
The thing is, once you get to a certain level of sophistication with effectively having to put metadata in your content to tell your content how to behave in different circumstances or to just expose information to other systems like search systems or AI systems or whatever it may be that isn’t the same information you’re exposing to the end user, you have to do it with tags. There’s no other way to do it, because all tags are as a way of putting information into a document that isn’t rendered directly to the user.
AP: Yes. You’re adding intelligence into your content.
PB: You may not like angle brackets, but if you have this case, you’re going to use some kind of tag at some point. And this actually relates in a funny way, I’ll stop my tangent in a second, to another conversation I had, I promise, to another conversation I had with one of DITA’s founding fathers, or one of the longest standing people on the TC, Eliot Kimber, which you know and I’m sure everybody who’s listening to this are going to know. And I love challenging him with stuff, because he has such good answers to everything.
And so I thought I would play devil’s advocate and be like, “Well, why not use a text based format? Why not use Markdown?” And we started going through some stuff and he was like, “Well, why not use DITA if you have all those cases, if you’re going to do all this stuff, if you can do all this stuff, why wouldn’t you just get that out of the box? At what point does it make any sense to ask the question, why not use Markdown plus this, plus this, plus this, plus this, plus this?” And all you’re doing is rebuilding DITA, which frankly is probably my position, not probably, it is my position anyways. But it was interesting to watch how Eliot got there and the way that he positioned it was so very Eliot and it was… I don’t know. I loved it.
AP: At some point, if you start with something maybe a little more boxed in, that’s not a technical term and you keep having to add things to it to do what you need to do, and that happens a lot, that may tell you you’re possibly a little too constrained, perhaps.
PB: Well, I don’t know if I would use the term boxed in, because I think boxed in, implies a structured starting point that has limits around you. Whereas I think the reality is that, especially a lot of the text-based formats, they’re so open, there’s no standard. There’s general accepted practices if you want to call them that, you can put anything you want in there and you can build a processor that processes it, literally anything. And I’ve seen so many bespoke things put into these formats over the years, that you realize that there is no box and you can do whatever you want with them, which in some ways is the beauty of them. But as you scale, more people have more ideas and there’s no box, so they can add whatever they want and they can add onto the processor. And then some of those people go to other places and some people didn’t document what they did and they forget why they did it and blah, blah, blah, blah, blah. And you have system creep. And the problem is, the system creep is built into your fundamental content structure.
AP: Your choice is enabling what you just said. Exactly.
PB: Right. That aspect, the fact that there’s no separation of concerns there, when you’re building this stuff directly into custom stuff, directly into your core content in a way which isn’t patterned and isn’t based on a larger set of standards and rules, means that you’re evolving something which is innately going to become brittle eventually.
AP: Sure. And as you add business requirements, that brittleness that you’re talking about can become basically magnified from my point of view, say for example, you have a merger and you’ve got two companies doing similar things, yet they’ve got two entirely different content models, two entirely different tool chains, at the end of the day, are you going to keep both of those things? I’m going to guess not. At that point, you’re going to have to go through a process of figuring out what you’re going to do. Is it going to be survival of the fittest? Are you going to do some bake off? Are you going to have someone come in maybe and take a look and say, “What should we pick?” There’s some options there.
PB: And so the merger is a great example, and it’s a clear vision of when two different content infrastructures are going to collide and something’s going to have to win. But I almost think that the merger, people tend to feel like it’s really distant. Nobody goes into work every day and thinks about mergers except bankers.
AP: The people who make the money from them.
PB: Right. But people in tech pubs, they don’t think about the merger until it hits them. But the thing that isn’t distant is product evolution. And a lot of products will… You’ll start a new project, and this will be its own product maybe, and it’ll build up, build up, build up, build up, and then you’ll realize it needs to be merged into this other thing. Or it can go the other way, where you’ll start this module in the product and it will build up, build up, build up, and you’ll realize, “Oh, it needs to be separated out.” And these are mini mergers.
AP: Yeah. Absolutely. Internal mergers.
PB: Totally. And the thing that you’ll see here is that, when you’re keeping content isolated into the product and it doesn’t have this box, so there’s no standard rules that go across all the different products, that when you have to bring them back together, maybe somebody in this product team decided, “Oh, we’re going to add this MDX component or this thing or this thing,” and then it doesn’t work now. Or it could conflict with this other version of that to somebody that was very similar, because they’re not talking to each other, because they’re not on the same product team. And that can become a fundamental problem. And then even beyond that, the thing is, while those are two separate things, you’re still one company.
Yes. Siloed tech stacks, essentially, more or less content tech stacks.
PB: And siloed user experiences. You go to the documentation for this product or module or whatever it may be, and it’s got this structure and this interactivity and it looks like this. And you go to this other one and it’s like, “Oh, okay, well this is same colors but functions very differently, navigation, all this stuff is just separate.” I think this element of consistency, when you don’t have accepted standards across the organization, it shows up. It shows up in efficiency, it shows up in user experience, and both in the customer and the employee side.
AP: The customers do not get a consistent experience, they don’t get consistent messaging. And I’m sure the marketing folks will be really happy about that when you are basically giving two different flavors, yet you’re the same company.
PB: Totally. Well, two is probably a best case scenario.
PB: I think it might be more like 40 in some cases.
AP: If what I’m hearing from you should thinking bigger, always be in your mind when you’re talking about modeling then? Or is that unfair?
PB: Okay. I guess we’re returning to the question, choosing a content standard or a content content model. I think being aware is what’s important. I think most organizations have a general concept of trajectory, what things look like, what the culture of the organization is going to look like. And not everybody needs scale, not everybody needs consistency across many parties because they’re just never going to be there. There’s plenty of hardware companies, software companies, any kind of company out there that’s just never going to have more than three writers. That is a situation. And in those cases, do you really have to think bigger? No, probably not. Should you? I guess that’s a different question.
AP: It’s a balancing act. I think that’s the best way I would put it. You’re right, with three writers, if you’ve got three content creators that presents a different set of challenges, problems that you need to solve versus having a team of three digits. It’s a completely different beast. It is.
PB: Totally. The reality is that you can get to know two other people very, very well and you can read all their stuff and just by the nature of that, you can stay in the same page.
AP: And then of course, the consultant in me says that group of three, what if your company takes off and you have all this growth that three could become nine or it could become 12 or it could become 15. You never know.
PB: Yeah, for sure. That’s the big question that I think that organizations have to wrestle with. If you know that growth is coming, in my view, it’s irresponsible not to choose something that will facilitate that growth. But if you don’t think that growth is coming or it’s not on the horizon, it might be the responsible thing to choose whatever is going to work with relatively low implementation friction and a good customer experience for your small group at that time. And then once you start to see that growth coming, be proactive in terms of transitioning to something which is going to support that growth.
AP: And that comes to a point I want to make here. If you do decide that a, let’s say smaller, and I don’t mean that in a pejorative way, a smaller solution, smaller scale solution, if you do go that route because it’s a good fit, I would suggest you have your eye on an exit strategy then and there when you make that choice, think about your exit strategy, where you might need to go next and think about how you could map where you are now to the new thing. Does it have to happen immediately? No. But I would recommend that you have that in the back of your head filed away because you may need it sooner than you think.
PB: Totally. I think that’s absolutely fair. The reality is that, when you’re trying to do complex content at scale and choose your axes that you want to put complex on, so it could be personalization, it could be multi versions, it could be multilingual, I could continue going, there’s all these different ways that content can become more of a complex, regional is a great example, this content applies to this region versus this region, which again, it’s personalization, but it’s a special form. When you’re in that circumstance, you really have to choose something that’s going to support that. And that doesn’t really matter if you’re one or 100 authors. You need to recognize that that’s your circumstance, where it’s like, “Okay, we’re going to have a personalization requirement.” Or, “We’re going to have a complex versioning requirement.” Or, “We’re just going to have so much content that isn’t highly isolated, especially content writer ratio or highly collaborative, that we need structure to support that.”
Think in the physical world, why do skyscrapers have more structure underneath them than houses? It’s because they need to be bigger. And so when you know that you have these situations, you have to match your content model selection to those things. And so when you’re starting to think about like, “Okay, what content model is going to do that?” Unless you’ve got a really specialized case or you’re in an industry that’s had a content model specifically built for them, aerospace is the one people throw around a lot.
AP: JATS for technical journals, things like that.
PB: JATS for journals. That’s a great one. The reality is that DITA is the gold standard for that stuff. DITA teams three to 300, they’re highly performant when they’re well-trained and they can build anything to any size you need in terms of content. There isn’t an upper limit for good DITA implementations. And part of that is, one of the words we said we weren’t going to say today, but it’s the ability to specialize DITA. That’s DITA’s secret sauce that I think a lot of people don’t realize how important it is. And this ability to extend the DITA without breaking what you’re currently doing is enormous. The business value there is beyond.
AP: Just to give people some context, before we got started, we were talking about not trying to go too deep down the whole DITA specialization path and what it is. And just for a quick, quick like 10,000 foot summary of it, specialization is a way that you take existing elements that are in the DITA standard and you basically build new structures based on things that are already in the standard. And that’s how you customize. And that is probably an oversimplification, but I want to throw that out there for people who were not familiar with the term. It is just a fancy way in DITA speak of saying customize the DITA model.
PB: There’s one key thing there, and this is the only thing I think you really need to know about specialization. One key thing you didn’t mention, which is that when you take an element in DITA and you specialize it, all of the DITA processors, understand your new element to be a version of your old element. It’s like, “Why does that matter?” Well, that matters because if we have a Scriptorium content model and in our thousand person Scriptorium company, someone named Tony goes and decides that they’re going to add this new functionality and it needs this new structure, they specialize off of the base content model. Even if that content comes back into reuse in other parts of the organization that haven’t implemented specific functionality for Tony’s new element, because those elements are derivatives of the underlying elements, they just get treated that way. You can have structured planned, asynchronous evolution of the model across a large enterprise that doesn’t break all the different delivery mechanisms that are based on the fundamental cross enterprise understanding of the model. That thing there, that’s what makes DITA enterprise grade and everything else not.
AP: Absolutely. To say that it puts the extensible and XML, even though I know I’m mixing DITA and XML here. It is super, super extensible. And it really to me, people talk about should, “I pick an open standard like DITA or should I do a custom model?” Well, from my point of view, you can have both based on the discussion that we’re having. That’s where my brain is going right now. I will say X years ago, actually X decades ago, I remember creating custom models mostly in SGML. There was no standard out there to support what we were trying to do. 20, 25 years later, we have DITA, which if that were available when I was creating those models nearly 30 years ago, you better believe we probably would’ve picked it, because it probably did 85, 90% of what we needed the custom model we created to do, so it takes care of that problem. And as you said, you can customize it without breaking the bigger picture. And that’s a big deal. That’s a really big deal.
PB: Yeah. It’s enormous. It is the business case for why you go through the upfront implementation to do DITA, because you look at content operations implementations that have been around and are still modern and are still delivering an ROI year-over-year, and they’ve been around for a decade or more, they’re all DITA, every single one of them. There’s no such thing as the 15-year-old Markdown implementation. It’s the same thing as with the wikis, they go through cycles. If you want something that’s going to serve you today and then in the long run and you have the ability to do the upfront, it’s going to work. Going back to your SGML comment, I think one of the best ways that DITA was ever positioned to me, this goes back a long ways, this is a friend of mine, he basically said, “The reason they invented DITA was so they didn’t have to do the first million of customization on every project for SGML.”
AP: I agree 110% with that, having lived through what you just said. Yes, a 100%.
PB: It was just reinventing the wheel every company, and it was a really expensive wheel and people were like, “Let’s stop doing this.”
AP: Right. It is cost savings off the bat, because like I said, if it gives you the majority of what you need, you can customize it and flex it to make it do what you want to do. I want to quickly investigate the flip side of that coin, are there times when you should not be customizing/specializing your DITA model?
PB: When you don’t need to. In a lot of ways, I think specialization, especially day one, less is more.
PB: Yeah. You should have a really, really good reason for specializing. And I think the thing that’s really challenging about specialization is that, if you think about it in terms of other technologies, it’s one of the four features that should convince you to go with DITA. But very few organizations use it day one and very few organizations should use it day one. But the reality is that over time, you’re going to find cases that you just can’t efficiently support in other ways. And the alternative to having specialization is something like HTML classes or some other XML attribute that you throw onto something or some tag thing you invent in Markdown. But it’s super difficult to validate that and to make sure that it’s used consistently. And none of that stuff really effectively translates back to the rest of the publishing pipeline in a way that is consistent. There’s not a strong process for it. You have to invent the process and the pattern and then actually do the thing you want it to do. Should you specialize day one? Sometimes. I would ask your friendly neighborhood consultant. about that one.
AP: I’ll tell you right now, sometimes when it comes to metadata, starting early with that is a requirement, that is based on some past project experience. Yes.
PB: That’s fair. And I think the way that you end up managing the metadata, because metadata is a really… We might have to decide what you mean by metadata, because I can make anything metadata.
AP: Things you shouldn’t, by the way. Yes, you can.
PB: There’s definitely a conversation around, where does your organizational intelligence and taxonomy and terminology weave into your content model?
PB: I think there are cases where that is specialization and there are cases where it is not. It’s really something that you want to use more standardized taxonomy mechanisms for that, or maybe you want to use on document metadata or et cetera.
AP: Yeah. There are layers there. You’ve got some choices and you can have other tools carry that burden too, that play well with your system. That’s a possibility as well.
PB: Totally. Yeah.
AP: Yeah. The only other thing that I’ll add to this, is sometimes just because you’re doing something the way you are now doesn’t mean it’s the right way moving forward. And you should not knee-jerk decide you must customize structure to match the way that you were doing things right this second. I would pause and look at things very hard before you decide, “Absolutely I must customize because we’re doing it this way.” Well, if you’re doing it this way now, for example, are your delivery formats going to be the same as they are right now? Is that going to lend itself well to all these different new online formats and things like that? Is it going to lend itself well to talking to other systems via API? I could go on and on. Basically, take a deep breath and decide if what you’re doing now is truly something you need moving forward, there’s a chance that you may need to compromise or rethink the way you’re doing. It may in a way that the DITA structure already supports with no customization whatsoever.
PB: I want to break down your point about doing it now for a second, because I think this is really important. There’s doing it now in terms of what are you publishing now? What are your target publish outputs? And then there’s doing it now, in terms of what are your internal practices? In terms of how it is you’re actually creating the content, what’s going into the content, how the content is coming together, how the content is moving, so do you have a distributed model where writers really don’t talk to each other that much other than at the water cooler, but they write their own books? Do you have a collaborative model? There’s all things in terms of doing it now can be so many things in the background.
AP: It’s not just publishing, it is also creation. That’s a very good point. Absolutely.
PB: In terms of doing it now for publishing, one of the things that I think is really critical is understanding the trajectory of your publishing and where it’s going. If you implement structure properly, there shouldn’t be a lot of publishing cases you can’t handle, generally speaking. And if there are, that’s typically where specialization comes in, if you need more semantics, more data typing, more of this to pull something out, because a lot of publishing cases that are more on the upper ends of complexity, what they’re really doing is, they’re doing intelligent selection. They’re saying, “Give me the things like this that connect to this under these conditions.” Something like that. When you’re looking at your outputs now, having a general concept of trajectory I think is really important. But then, the core point that I want to make here is that, a strong separation between doing it now on the backend and doing it now in terms of publishing is what you need going forward.
And this is where almost every wiki based or HAT tool based or whatever else, which is write it and publish it, WordPress. This is where they all break down because there’s no separation or very, very little separation between what you’re doing on the back end and what shows up on the front end. You can’t evolve those two things independently. And that rigidity means that you get stuck and you can’t do the things you need to do on either side. When you’re building a new content operations ecosystem and you’re redoing these things and you’re thinking about, “What are we going to do in the future?” I would say even more than the content model you choose, you need to choose a content operations’ ecosystem that has separation of concerns and where you can evolve the different components independently without breaking one or the other.
AP: That is really good advice. And I really like your front end and back end distinction, because I think that is very important and it’s very easy to conflate those two things, by the way, especially if you’re working in an environment that already combines those things. I think that’s really, really good advice. And before we wrap up, is there any other smart point you want to leave our listeners with in regard to picking a content model?
PB: Smart points? I don’t know. I don’t know if I do those. Do I do those?
AP: Well, you just did one with the back end front end distinction, so if we want to leave it there, we certainly can.
PB: How about I build on that just slightly?
PB: Just to make sure it’s really complicated.
I think that front end and back end is one level of maturity when you’re thinking about the separations of systems and a content operations’ system. But I think that one of the things you’ll see is that a lot of organizations will evolve to the point where it truly is an ecosystem. Think about it this way, you have your centralized content repository, that’s where most of your authoring is done. It’s where your pros are written, et cetera. And then you have your primary front end, which is typically a website, but it’s probably mobile ready and whatever else as well. And you have other front ends too, so you have separation of front end and back end in that way. That’s what we were just talking about.
But it’s very common that you start to see there are other systems which integrate with the back end. You might have a system that manages your API documentation, which is typically generated, it’s not written. However, the usage information around the API documentation, that provides the developers the context and the instruction to know what they’re looking at, that’s all written. That goes into the content repository. Now you need a connection between those two things, and you need some kind of a mechanism where they’re going to be able to play nicely together to some extent, at least to get the information generally to the front end without having multiple experiences where a user has to bounce back and forth between raw reference and then guided more learning style content. You might also have a system which holds information about your product, so it could be product configurations, it could be product specifications, it could be all different kinds of things there.
And that information is oftentimes going to come over in a tabular format. And a tabular format is a really interesting thing, because when you’re looking at simple tabular formats that can be represented as a CSV, it’s not a great idea when you’re doing data exchange, but it can, but it can always be represented in a tags based format. Any tabular format can be represented as tags. And even the most complicated Excel stuff under the hood, it’s XML basically, or it can be exported as XML. You start looking at that and you go, “Okay. Well, what if we’re going to start mixing in things which is more tabular data from other systems into the flow of content that we’re producing for the information experiences down the line? Okay. How are we going to support that in the future and how is that going to come together?”
When you start thinking about these things, the awareness that your most basic content operations is a word processor on a desktop, and then you move to front end and back end. But eventually, if your organization demands it and your customer demands it, and you have to deal with them in that way, you’re going to be in a place where you’re at an ecosystem and there’s going to be content going back and forth between systems. There’s going to be something which is pulling together information, data, and content, and then it’s pushing it out to experiences. You’re going to have multiple experiences.
AP: What you’re describing is content as a service. That’s what I’m hearing.
PB: Yeah. Content as a service is one of the things that comes out of this. I just think that when you step back and you’re like, “Okay, what does our organization look like in terms of the information that in a perfect world our customer could access and could access in a seamless way, without going to different experiences and having to navigate around when it should be together, it is together? That’s a big thing. We have these five things when they should be together, let’s put them together.” That’s a thought exercise, that’s worth an afternoon when you’re about to decide how it is you’re going to build your next content operations’ ecosystem. Because if there’s nothing else else I can promise you, it’s that whatever you choose when you set up a content operations’ ecosystem, even if you don’t actively choose, it’s going to be with you longer than you think.
AP: Absolutely. And I think that’s a good place to end and a good caveat to choose wisely and choose well. Patrick, thank you very much for your time. We appreciate it.
PB: Yep. Thanks for having me.
AP: Thank you for listening to The Content Strategy Experts Podcast, brought to you by Scriptorium. For more information, visit scriptorium.com or check the show notes for relevant links.