Applications of AI for knowledge content with guest Stefan Gentz (podcast)
In episode 152 of The Content Strategy Experts Podcast, Sarah O’Keefe and special guest Stefan Gentz of Adobe discuss what knowledge content is, what impacts AI may have, and best practices for integrating AI in your content operations.
“As a company and as a content producer who’s publishing content, you are responsible for that content and you cannot rely on an agent to produce completely accurate information or information that is always correct.”
— Stefan Gentz
- DITAWORLD 2023 recordings (YouTube)
- Adobe – Beyond Paragraphs
- Sarah O’Keefe – Is AI the meteor? Are we the dinosaurs?
- AI in the content lifecycle (white paper)
Sarah O’Keefe: Welcome to the Content Strategy Experts Podcast, brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way.
In this episode, we welcome Stefan Gentz from Adobe. Stefan is the principal worldwide evangelist for technical communication. He’s also a longtime expert in the space with knowledge of not just technical communication, but also localization and globalization issues. He’s here today to talk about the opportunities and applications of AI in the context of knowledge content.
Hi, everyone. I’m Sarah O’Keefe. And Stefan, welcome.
Stefan Gentz: Hello, Sarah. Nice to be here and thanks for inviting me.
SO: Always great to hear from you and look forward to talking with you about this issue. So I guess we have to lead with the question of knowledge content. What do you mean when you say knowledge content?
SG: It depends a little bit on the industry, but generally, there’s enterprise content and there are multiple areas in enterprise content and we all know marketing content and that beautiful content on marketing websites and advertising and so on, but there’s also a huge amount of other content in an enterprise and what kind of content that is is a little bit depending on the industry and which sector we’re looking at. But they also share a lot of content, which is produced across multiple industry verticals.
If you look at software, hardware, high-tech like semiconductors and robotics and so on, we have content like getting started guides, user guides, administrator guides, tutorials, online helps, FAQs and so on. But we also have things like knowledge bases, support portals, maybe API documentation, and you will find similar content in the automobile and industrial heavy machinery industry where you also have user manuals, maintenance guides, things like that, but also standard operating procedures, troubleshooting guides, safety instructions, parts catalogs and so on.
And when we look into industries like BFSI, banking, financial services and insurances, we have content like regulatory compliance guidelines. Of course, also policies and procedures, but also things like accounting standards documentation or terms and conditions, and again, knowledge bases and support portals, training portals for employees, et cetera, or partners.
And in healthcare, medical pharma, we have a lot of similar content, but we also have things like citation management, clinical guidelines, the core data sheets, CDS, dosage information, product brochures, regulatory compliance guidelines again, SOPs, maintenance guides and so on. And we have in other industries, things like installation guides, user guides, flight safety manuals in aerospace and defense, technical specifications of products, kinds of products and so on.
So there’s a huge amount of enterprise content that is produced in companies and marketing content is probably just a fraction of the content that is produced in other departments, like classic technical documentation, training departments, and generally, also as I just said, knowledge content producers or I think you originally mentioned product content which also fits, but I like to call it knowledge content because it’s a very broad term that covers not only knowledge basis as many people think, but all the content that carries and transports knowledge from the company to the user of that content.
SO: Yeah, I’ve also heard this called, I think we’re all searching for a term that encompasses the world of… It’s almost like not-marketing, not persuasive, the other stuff, other.
SG: Non-marketing content.
SO: I’ve heard it called enabling content in that it enables people to do their jobs, but of course, enabling has some pretty not so great connotations.
Okay, so we take your knowledge content and we wanted to talk about what it looks like to apply some of these recent AI innovations into the context of knowledge content. So what are some of the opportunities that you see there?
SG: There’s a huge amount of opportunities for companies using AI. Maybe we can break it a little bit down into two areas and let’s not talk about creative gen AI, like Adobe Firefly or Midjourney or so that are engines that are used to produce visuals, images, and graphics, but let’s talk about the written content here.
So I see two areas there and one is the area of authoring where content is created, and then there’s the area where content is used and consumed, whatever the consumer might be, maybe chatbot or chatbot interacting with an end user, or maybe even other services that use the content. And we can, of course, when we think from the content consumer perspective, a chatbot is definitely an area where AI can help to find content better and give better answers and maybe also rephrase content in a way that is appropriate to the content consumer. If I’m talking to, let’s say, 10-year-old children, or if I’m talking to a grownup with a university degree, they might have different expectations in how they want to get the content presented to them in terms of language, in terms of voicing, voice and sound.
SO: Right. The 10-year-old understands the technology and you don’t have to explain it to them.
SG: That might be, of course, true. Yeah, maybe they don’t even need the chatbot. So that’s the content consumer perspective, which AI can help to find better results, more fitting results, and produce nicer answers.
But there’s the other field where content is created with authoring content, and I see a lot of opportunities there. And at Adobe, especially in Adobe Experience Manager Guides, our DITA CCMS for AEM, there, we are implementing a lot of AI functionalities. I’m not sure how much I am allowed to talk about that, but we showed a couple of things at the last DITAWORLD, Adobe DITAWORLD in June where we presented some of the features that we’re implementing into AEM guides, into the authoring environment.
And one is, for example, the engine checks the content that an author is creating and compares it with the repository of content that is already there. And then makes suggestions like, “Oh, I understand that you’re trying to write that safety note, but there’s also a small snippet of content with a standard safety note in your company that maybe you want to turn that what you’re currently writing into a content reference, con reference, or maybe that single term that you’re writing there, you could turn that into a key ref because there’s already a key definition in your DITA map,” things like that.
So to assist the author to leverage the possibilities that their technology writes in a more intuitive way, instead of thinking for maybe minutes, “I remember I had written a note for that already, or I had already written that safety note,” the system will assist you with that and give you the suggestion, “Hey, this is already there. You can reuse that content.” That is authoring assistance.
We also showed, I think, some sort of auto-complete. So you’re starting to write the sentence and then a small popup comes up giving you a couple of suggestions how you could continue the sentence. And we all know this predictive typing thing for quite a few years, but usually, they are more created on classic statistical engines that try to predict what you want to write. But our solution there will take the repository of content that is already there in the database as a base for making suggestions that will fit much better than just a statistically calculated probability, how you probably want to continue the sentence.
So this kind of authoring assistance with auto-complete and predictive typing, that gets much better when you have an AI engine that understands your existing content and can build these suggestions on top of that. That is definitely one area.
SO: We’ll make sure to include a link to that presentation, which I actually remember seeing, in the show notes. So for those of you that are listening, it was at DITAWORLD 2024 and-
SG: 2023. You’re quite ahead to the future.
SO: I’m sure there will be an AI presentation at DITAWORLD 2024, however-
SG: Oh, I’m very sure. Yeah.
SO: Yeah. So this year, the 2023 presentations had this demo of some of the AI enablement that’s under development, and we’ll get that in there for you.
SG: Yeah. So these two areas are definitely areas where AI will help authors in the future, but there are many more things. For example, when you think in terms of DITA, you have that short description element at the top and an AI engine is pretty good in summarizing the content off of that topic into one or two sentences. And if you try to do that as a human being and you have your topic in front of you with maybe 10 paragraphs, a couple of bulleted lists and a table, and then trying to find two sentences that are basically the essence of the topic and making two nice sentences, “This topic is about dah, dah, dah, dah, dah,” that is quite hard for a human being, and an AI engine can do that in two seconds.
This is another area where AI will help people to get that job faster. And of course, they can then take that suggestion or not or rephrase it and rewrite it if they want, but they can take it as a starting point, at least. Short description summarizing the content.
It’s also rewriting content, maybe for multiple audiences. Originally, a couple of months back, I bought a tumble dryer from a German company for household appliances, and they have that classic technical documentation that comes with a tumble dryer explaining in long written sentences how to use it. And there are better concepts sometimes to do that. For example, a step list. And I copied and pasted three, four paragraphs there and said, “This is classic documentation. Can we write that a more simple way, maybe as a step list in DITA?” And then I got a step list with the paragraphs broken down, the steps that are ascribed in these paragraphs broken down into step list, step one, step two, step three, and so on. And that made the content much more consumable and accessible.
And so one could use AI here and say, “Okay, here’s my section in my DITA topic, for example, with the legally approved official technical documentation content,” and then I just duplicate that and let it rewrite as a step list maybe for the website. And then I could even duplicate it again and say, “Now let’s rephrase that for multiple audiences,” and say, “Okay, I have that TikTok generation person in front of me and they want to be addressed in a more personal, more loose language, more fun language, and please rewrite that content for this audience.” And then the engine will rewrite that content and say, “Yeah, hey, yo, man, you can put your dirty cloths into the dishwasher or into the tumble dryer, not the dishwasher. And you will have a lot of fun watching how it’s rotating when you hit the start button.”
And then you can change. That’s, of course, an extreme example, but you can create multiple variants of your content for different audiences very easily then. And I see that, and a lot of people are talking about doing that on the front end, on the website for example. I see that more from a responsibility perspective, on the authoring side when an author is doing that and approving it, so to say, maybe checking it if the information is really correct, the steps are really in the right order, whatever, and then it goes checked for different audiences into the publishing process in the end because that responsibility is, I see that not on the AI engine, I see that on the author who needs to make sure that the content is still accurate and correct.
SO: And I think that’s a really important point because at the end of the day, the organization that is putting out the product and/or the content that goes with the product, they can’t say, “Oh, I’m sorry. The AI made the content wrong. Too bad. So sad.” I mean, they are still responsible and accountable for it, which actually brings us very elegantly into the next topic that I wanted to touch on, which is what are some of the riSOs, some of the potential challenges and red flags that you see as we start exploring the possibilities of using AI in our content ops?
SG: That’s a very important topic I think to talk about because even very advanced engines like ChatGPT come with certain challenges and problems. There is, of course, we just talked a lot about, is the information correct or is it inaccurate or is it maybe even just invented by the engine? Usually, people call that hallucinating by just generating content and it would continue to generate content as long as you want and it will invent things.
And I was throwing some content to ChatGPT and said, “I want to write a nice blog post or a LinkedIn article. Can you give me some quotes that fit to the content that I have provided you?” And it provided me five, 10 quotes that sounded like some CEO would have said that, and it was even giving some names. And then I was aSOing, “Is that person, John something, really existing? And is that a real quote?” “No, I invented that, but it might fit. It could have been said by someone.”
SO: It could be real.
SG: Yeah, it could be real. That was basically the answer that ChatGPT was giving. That comes with a huge problem because as a company and as a content producer who’s publishing content, you are in responsibility for that content and you cannot rely on an agent to produce completely accurate information or information that is always correct because it will always generate content and will not let you know that it generated that content.
And this is why I also say there’s no danger that human writers or content producers will get jobless because of such an engine. No, the role will change. Maybe we use these engines more to generate content, but we as authors become more the reviewer and the editor of that content. It’s a little bit like machine translation where you had a machine translation engine translate your content, but then you need to do post-editing to make sure that this content and the translation is really correct and that the correct terms are used and so on. And we will see a similar development with gen AI for text-based content for sure in the future when it comes to all kinds of content production, maybe technical documentation, maybe knowledge bases, et cetera.
SO: So then can you talk a little bit about the issues around bias and the ethics of using AI and where that leads you?
SG: An AI engine like ChatGPT, for example, is of course trying to create unbiased content, but we were talking about that. I don’t have an example for that, for written content now, but we were talking about that example from the lady who was giving a photo of herself and then aSOed then the generative AI engine, “Please make me a professional headshot photo for an interview letter.” And it created a nice photo with nicely made-up hair and some nice dress and so on with a nice background and looked very professional, like a professional headshot from a professional photographer. The only problem was that this photo was showing a person with blue eyes and blonde hair while the person who provided the original photo to be beautified was an Asian person with a different look.
And that brings that discussion of the bias of an engine. Maybe it was feeded and trained with 5 million photos of professional business people photos from a Caucasian background and maybe just 1 million photos from an Asian background and maybe even less from an Indian background or whatever. And then this engine is making statistical calculations and says, “You want to turn that into professional business photo? Based on my training set, my training data set, I will make you a Caucasian-looking person.” And that is a huge problem.
And this is where this governance of AI generated content will maybe even become a full job one day where we say we need to make sure that the content that an AI engine is generating is really appropriate and culturally sensitive and is not biased and taking all kinds of other factors into consideration, and maybe an AI engine is not yet able to do that.
SO: Yeah. So the question of what goes into the training set is really interesting because of course, it is a little unfair to blame the AI, right? The AI is, in its training sets, reflecting the bias that exists out in the world because that’s what got fed into it.
And I don’t want to go down the rabbit hole that is deep fake videos and synthetic audio, but I will point out that just earlier this week, I saw a really, really interesting example of an engine where somebody took a video of themselves speaking in English and talking about something. Actually, they were sort of saying, “Hey, I’m testing this out. Let’s see what happens.” And then the AI processed what they said, translated it and regenerated the video with them speaking first French and then German.
And so it was, I don’t want to say live video, it was synthetic video of a person who spoke in one language and who was then transformed into that same person speaking fluently in their voice in a different language that they do not in fact speak because the content was machine-translated, and then they used the synthetic audio and video on top of that to generate it.
I mean, my French isn’t very good. It sounded plausible. The German sounded fine. I heard one mistake, but he sounded like a fluent German speaker, and there wasn’t any obvious weird rearrangement. They somehow matched it onto there. It was quite impressive and it was fun to watch. And then you think about it for a split second, and you realize that this could be used in many different ways, some of which are good and some of which are not.
SG: Yeah. I mean, we had some really ugly examples here in Germany where some political party was using gen AI photos to transport a certain political message, and then it came out that these photos were not from actual events that they were claiming it would be, but were AI-generated.
So there’s a lot of danger in there, and we will also need to adapt as societies and human beings to get a better find feeling what is generated content and what’s not? That will become increasingly difficult, but at least developing the awareness that what we get presented as content, especially when it comes to images, that we’ll need to develop stronger than ever before. Photoshop is there for a long time. We all know that photos can be Photoshopped, but with this new approach of generative AI that this awareness becomes even more important.
But when we talk about ethics, I know we are running a little bit over time probably, but there’s another aspect in ethics that I see as something we need to discuss in more detail in the future. We feed the engines with content, existing content, and maybe it’s content that is even intellectual property of someone. And then this engine produces new content that is leveraging the knowledge of that, that is in that content, to produce new content. And then something, especially in the context of university content, research content and so on, who’s the owner of that content that is newly created? And whose intellectual property is it? And what is, if content is generated that is very clearly rephrased of existing content from some content that is maybe protected by licenses or so?
So there’s also this ethical discussion that we need to have and that will for sure maybe even need some regulation on the government level in the future.
SO: Right. And the answer right now, at least in the US is that if the content was generated by a machine, you cannot copyright it. That implies that if I feed a bunch of copyrighted content into the machine and produce something new out of the machine, that I have essentially just stripped the copyright off of the new thing, even if it’s a summary of the old thing or a down-sampling or a gisting of the old thing, the new thing is not subject to copyright unless there is significant human intervention.
So yeah, I think that’s a really good point because there’s a big riSO there. And there’s also the issue of credit. I mean, if I just take your content and say it’s mine, that’s plagiarism, but if I run it through an AI engine and plagiarize from millions of people, then it’s suddenly okay. That seems not quite right. Okay, so yes, tell us-
SG: A plagiarism engine that checks the content is probably very useful in the future, yeah.
SO: Yep. So lots of things to look out for. And I think it sounds as though, from what you’re saying, you see a lot of potential benefits in terms of using AI as a tool for efficiency and recombination of content.
So if you join me in, I’ve already moved on apparently to DITAWORLD 2024, so if you look ahead a year or so, what do you see as the opportunity here? How are companies going to benefit from doing this, and what kinds of things do you think will be adopted the fastest?
SG: I think coming back to the beginning basically, these two areas of authoring and authoring content, content creation and content consumption, and these are the two fields where companies can benefit and will benefit from the near future as soon as enough of these new features will have found their way into the tools themselves.
Faster content production is definitely one part, but that also means that authors need to learn how to create content with AI engines, the art of prompting as a keyword here, and to detect the voice and tone of generated content. It’s relatively easy after a while to identify, oh, this content was written by ChatGPT, for example, because the standard way ChatGPT is generating content is sort of always the same, and you can easily identify it after a while. This will give some job changes and means that companies will need to adapt to that before they can really benefit from it.
People, authors, and content creators need to learn how to get the right content out of an engine, out of prompting, prompt engineering, how to write proper prompts, and that will take some time and trainings and so on, but then it’ll really speed up the content production process a lot. And the second benefit is then with the content consumption, providing just better customer experiences by having more intelligent chatbots that provide better answers, right-fitting answers, maybe assisting users of a long blog post on a website with giving a small summary of that and things like that.
So there will be many benefits for companies using AI, just only when it comes to this specific area of content, knowledge content, but there will be other areas of course as well, financials, detecting patterns in financial data, and so on, for research and so on. There will be a lot of benefits, but when we talk about content, the content we are talking about here today, there will be mostly the biggest benefits will be probably content production, which also includes, for example, translation.
SO: Yeah, I think I agree with that, and that sounds like a relatively optimistic note to wrap things up on. Stefan, thank you so much for all of your perspectives on this. You’ve obviously thought about this carefully and you’re sitting inside an organization at Adobe that is actually building out some tools that are related to this, and I’ll be interested to see what comes out.
Tying back to that, the DITAWORLD 2023 recordings are available, and we’ll put those in the show notes. There were a couple of presentations in there, this was back in May, June, that addressed the state of AI and some of these similar kinds of considerations along with that. I’m not sure if it was exactly a demo, but there was a discussion of what the AEM Guides team is thinking about in terms of product support. So we’ll make sure to get that into the show notes.
Scriptorium has a white paper on AI and we’ll drop that in there, and then I think there will be more discussion about this going forward. So thank you again for being here, and we’ll look forward to hearing more from you.
SG: Thank you.
SO: And with that, thank you for listening to the Content Strategy Experts Podcast, brought to you by Scriptorium. For more information, visit scriptorium.com or check the show notes for relevant links.