Using text strings and microcontent (podcast, part 1)
In episode 90 of The Content Strategy Experts podcast, Gretyl Kinsey and Simon Bate talk about using text strings and microcontent. This is part one of a two-part podcast.
“They’re starting to get the idea of taxonomy and how important it is for all parts of their business to communicate using the exact same language. If this can be captured and put in one place, then those strings can be available to everybody.”
– Simon Bate
Gretyl Kinsey: Welcome to The Content Strategy Experts podcast, brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize and distribute content in an efficient way. In this episode, we talk about using text strings and microcontent. This is part one of a two-part podcast.
GK: Hello and welcome everyone. I’m Gretyl Kinsey.
Simon Bate: And I’m Simon Bate.
GK: And we’re going to be talking today about text strings and microcontent, so I think the best place to start is just by defining what each of those things are.
SB: Yeah. Well, text strings, it’s one of these things grew out of computer science, and all it really means is it’s just a sequence of text characters. Usually, it’s a paragraph or shorter. It’s more often just a sentence or just a snippet. It’s just, essentially, a series of individual characters.
SB: Microcontent is a little bit more specific and it has a number of definitions. The first one was about 1998, Jacob Nielsen coined it as a small group of words that can be skimmed to get a basic idea. And the idea here is something like a title or a headline or something like that, where you just look at it and you can see immediately what’s there. Since then, it transmogrified, and now it often refers to small information chunks that can be used alone or in a variety of contexts. And of course, one of the big contexts that people are really interested in using microcontent now are chat bots and voice response systems.
GK: Yeah, absolutely. And that gets into one of the next things I wanted to talk about, which is all the different ways that we’ve seen this concept of microcontent come up in our work with clients. We’ve had all kinds of different requests around using microcontent and use cases for it. So what are some examples of that?
SB: Oh, we’ll get things like, people say we need all our strings maintained in just one place, that then we can use those individual strings on a number of device interfaces. Sometimes people have a localization issue. They’ve got these strings and their devices, wherever their strings are used. And in this case, I’m talking about strings that are often used as part of an onboard device control system, so you get a small, little display panel and it pops up some words, some phrases or instructions about what someone’s supposed to do. Some of these are localized to many, many different languages, and so, it’s good to be able to have those strings all in one place and be able to localize them in a good, organized way.
SB: Other people will say they’ve got a new device coming online and it needs its strings in JSON format. And so, they need to know how to write and maintain the content and then export it so then it can be available in JSON.
GK: And just for context, for those who may be unfamiliar, what exactly is the JSON format?
SB: Sometimes we have requests, people just need company-wide consistency of terms. They’re starting to get the idea of taxonomy and how important it is for all parts of their business to communicate using the exact same language. If this can be captured and put in one place, then those strings can be available to everybody, and everybody, when they need them, they can say, okay, I need the string that describes this thing. And so, quite a lot of the time we hear that request.
SB: Another simple one we’ve just described earlier, they say, we’re implementing a chatbot. We need a way of creating and maintaining the strings. And of course, when you have a chatbot, of course, there’s loads of metadata that goes with those strings. So they have context and that has to accompany it. And sometimes, we get people, they’ve actually worked out a lot of these problems before, but they say our spread solution isn’t workable anymore. So essentially, they’ve got all these strings, but they’re maintaining it just as a spreadsheet and trying to add new columns when they add languages, or add new columns when they add additional uses of a term, and so on. But in some ways, they’ve actually got some of the issues worked out. They just need a good way, a good repository of storing the information.
GK: Yeah. And that really gets to some of the things we’ve talked about in some of our previous episodes around taxonomy and metadata, that planning things out in a spreadsheet is a really great starting point, but you do eventually hit this turning point where it’s not really going to be sustainable as you scale up. So that’s really a great point when you reach that to say, okay, maybe we do need to look at a different way to work with these strings.
SB: And that goes two directions too, because there’s the spreadsheet itself and the structure of the spreadsheet and trying to keep the information in it, and then there’s the whole management issue of who has the spreadsheet? What’s the latest spreadsheet? Now, of course, now with Microsoft Office online and tools like that, maybe the spreadsheet can be shared around, but there is often an issue when you’re maintaining things in a spreadsheet about who has control over it. And whereas, if you have a CCMS or something like that, then the control over the content is much more easily controlled.
GK: Yeah. You have a lot better control, like you said, over the content governance aspect, whereas when everything is just in a spreadsheet, you’re locked out in a way and it’s not as easy to disseminate that information to everybody. And this gets into the next thing that I wanted to discuss too, which is the idea of planning. because I think spreadsheets are a lot of times the starting point for that. So when people start to plan for the use of these text strings or microcontent, what are some of the factors that go into that?
SB: Well, there are really three angles to planning your text strings or microcontent and they are content creation, the maintenance of that information, and then delivery. And all of these factors inform your final design. And it’s a mistake to try to tackle these in order and say, Oh, let’s design the creation first and then handle maintenance and delivery. They all have to be developed all at the same time.
GK: Yeah, because all of them really play into each other in different ways. And when you are coming up with that strategy and figuring out that plan, you have to think across all three of those different angles. So let’s dive into each of those a little bit more and talk about, first, the aspect of creating content.
SB: Yeah. So often, when you’re creating the content, an XML solution or particularly DITA works very well for maintaining the content.
GK: And then, with one of the things that we see often with DITA is reuse, so how does that come into play?
SB: Well, it depends really how it’s going to be used, because there are two ways of looking at it or two ways of using reuse. One is you may be creating strings and those strings are output. And then, that output is reused across a number of individual devices. And then, there’s also the true DITA sense of reuse, in that you create these strings and you make them available for reuse across topics. You could actually need strings for both of those purposes, but in some ways, that’s going to inform your decision about how exactly you store the content in your storage solution.
GK: Yeah, absolutely. And we often see the need also for different forms of the same string. So, for example, a version of it that is abbreviated. So how is something like that identified?
SB: A lot of it’s, you have to know your content. Look at your content and know how it’s used. And this is a really big thing when you get to device strings, because there are times when the same string or the string with the same idea may need to be expressed in a number of different ways. You may need a short form of the string. It could be a phrase or a sentence or instruction, but there could be some applications where there’s not that much screen real estate, or they need a smaller number of characters to communicate the same thing. So you may actually need two different forms of the same string. You need a long form and a short form.
SB: And then, as we were talking about abbreviations, there are also times when you have a string or you have a term, and sometimes you just need to use that term abbreviated. You could actually be using these strings to kick out labels or something that go on a display panel or something like that. Sometimes those need to be abbreviated.
GK: Yeah, absolutely. I want to talk about some of the other considerations that go into this. And one of them that’s a really big one is metadata. So how does that come into play?
SB: Yeah. This is absolutely a big area of what you have to think through in planning your project. There are two main areas where metadata is going to come into play. And one is for your authors, because as they’re creating the content, they need to know what is the string for? What’s the final purpose of the string? Where is it going to be used? And that then informs also for the authors, what are the considerations they need to use when writing or maintaining it? You may need to leave an instruction behind or something to say, this can’t be any more than 50 characters, or I made a particular decision or corporate or legal has made a particular decision about what we can say here or what we can not say here. So that kind of information is really useful if it can be maintained in metadata, along with that string.
SB: Now also, there’s how the string itself is going to be used. So the consumer end, so there may be something like there’s an identifier associated with that string. Because when you create a GUI, from the GUI, for every component in the GUI you’ll have an identifier. And if you can use that identifier then to link to the string, you know which string is going to be used for which component in the GUI. And when I say GUI, of course, I mean, graphical user interface. And the others are things like keywords and things that might be used by a chatbot, or a voice response system.
SB: One good thing to know in metadata is there are emerging standards for some of the metadata. In particular, TCOM is building a metadata standard. It’s been out for a little while now. It’s called the IIRDS or intelligent information request and delivery standard. And it’s a standardized metadata vocabulary for describing technical content. It’s sensibly built because there are some things that are just fixed and standard pieces of metadata, but it’s also built to be expanded. You can add your own content to the metadata, because of course, every use, every application of these strings is going to have its own special needs, its own special considerations.
GK: So in addition to metadata, another big consideration is localization, right?
SB: Yeah. This is a number of considerations you have to apply if a localization is going to be one of the reasons why you’re creating these strings, or if your strings are going to be localized in the first place. Number one is there’s often a difference in length of strings for different languages. If you’re going to be localizing English text, say to German or Russian, there’s a great expansion of the length of the strings. Now, the people building the devices where your strings might be used will also have to know that the device itself is going to be marketed in other areas. They’re going to have to be able to accommodate these longer strings. This Eventually comes down to the creators and they have to know that within a particular language, the strings may have a maximum length. There may be of maximum screen interface. And again, that gets back to the metadata that describes the strings itself and what is that string for?
GK: So what about cases where people are trying to think about localization and they’re doing some shortcuts or work arounds and saying, Hey, we can just have this one piece in the middle of a sentence be a string, right?
SB: Yeah. Well that can work. And unfortunately, it works very well in English, but doesn’t necessarily work very well in other languages. And there’s a number of things to consider. And even in English, there are some issues if you’re just going to be substituting a single word. For instance, the definite articles a and an which depend on the following letter, is it a vowel or is it a consonant? So just trying to swap out a single word there is going to be problematic. It gets worse when you get into other languages, gendered languages. There’s a number of other considerations to take in mind there. So you just have to be careful, know your languages, set your expectations for what languages you’re going to go to.
SB: Another thing to consider, and this is not just for string substitution, but if you’re using short words, if you’re using individual words, again, English has this nice facility of we’ll have words like file that serve both as nouns and verbs. And so, you could write file and it’ll work very nicely in one use, but when it gets translated, there’s a question. Is this to be used as a noun, a label, file? Or is this actually an imperative, a verb denoting some action. Do you have to file? As you’re thinking about localization, again, it’s really important to keep these things in mind. And again, this is where the metadata describing what this string does.
GK: Yeah. This is something that we often caution people about when we know that localization is on the table or if they think it might be as they grow in scale. And that’s one of the reasons why some of the general advice that we tend to give is that if you are going to make a short phrase or a single word into a string that can be reused at different places, that you stick to something that’s going to be pretty safe, like a product name or a company name, something that is maybe not even going to be localized depending on how you do your branding, but something like that where the risk of the way the word is used is a lot less great than it is with just a normal word that’s part of your text. If it’s part of your brand and terminology, it’s likely to be a little bit safer when it comes to making it a string.
SB: Exactly. And so, yeah, sticking to product names and things, or just considering keeping it at the sentence-level. So your string should be the whole sentence. Even if that string has to get translated every time into a different language, in some ways that’s going to be a whole lot better, more predictable, get more predictable results than trying to do any of this swapping.
GK: Yeah, absolutely. So if you’re using DITA and you’re working with strings or microcontent, what are some of the possible models that you might use for that?
SB: There’s a number of ways you can look at it in DITA. And of course, a lot of this is informed by how your strings are going to be used. One approach, and some people might arrive at fairly quickly, is the idea of using keys. And there are some advantages there, but keys used directly may also run into some issues. They’re fine for single strings in isolation, but if the key itself needs to have any kind of DITA markup, you run into problems, mostly because of the DITA content model and what is allowed inside keyword, which is if you’re using keys for short pieces of texts and the keyword is the element you’re going to be using.
SB: Now, I say directly, because we can also use keys to identify glossary entries. And a glossary entry topic actually is something really worth considering to store these strings, because already the glossary entry topic has a number of elements for usage, different forms. It has already elements that identify acronyms or expanded forms. And DITA itself is set up to process these with the abbreviated form element. There’s a lot of good things in DITA. And you may want to consider maybe not using glossary entry straight, but actually specializing it. And of course, that always has the advantages that you’re going to be working with much more, much better semantics. If you specialize, you can actually identify for your users exactly what they’re going to be doing.
SB: Of course, you know, there is a downside to using the glossary entry. And that is just because it’s a great overhead. It essentially means for every string, you have to create a new topic. So this is potentially a vast number of topics you’ll have to create. So for some uses, that might be okay. For others, you may want to pull things together more. And so, you might consider creating topics, organize those topics with sections within topics, and then within those sections, you can either define individual words, strings. There’s a number of different ways you can do it. Again, you can do specialization. Several of the things we’ve done for people using strings, we’ve actually created specializations that help them manage the individual strings, the metadata that goes with those strings.
SB: Another thing we’ve seen is using tables. And tables themselves, of course, it gets back to the spreadsheet idea, but steer away from that for a moment. And nice idea about a table is you can have a string. You can have columns for the string itself. You can have a column with the ID. You can have a column with the description about where that’s going to be used. You can have abbreviated forms and so on. And of course, the advantage there, the differentiation with a spreadsheet is if you’re going to be translating, then the translation occurs at the topic level. And so, you’ll have a separate topic for every language, or a separate version of that same thing in each language.
GK: We are going to wrap things up here and continue our discussion in part two. So thank you for listening to The Content Strategy Experts podcast, brought to you by Scriptorium. For more information, visit scriptorium.com or check the show notes for relevant links.