white Scriptorium owl on a blue background

October 30, 2023

Content operations Podcast Podcast transcript

How machine translation compares to AI

In episode 154 of The Content Strategy Experts Podcast, Bill Swallow and Christine Cuellar discuss the similarities between the industry-disrupting innovations of machine translation and AI, lessons we learned from machine translation that we can apply to AI, and more.

“Regardless of whether you’re talking about machine translation or AI, don’t just run with whatever it provides without giving it a thorough check. The other thing that we’re seeing with AI that wasn’t so much an issue with machine translation is more of a concern around copyright and ownership.”

— Bill Swallow

Related links:

LinkedIn:

Transcript:

Christine Cuellar: Welcome to the Content Strategy Experts Podcast, brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way. In this episode, we’re talking about the parallels between AI and machine translation. Hi, I’m Christine Cuellar, and with me on the show today I have Bill Swallow. Bill, thanks for coming.

Bill Swallow: Hey, thanks for having.

CC: Absolutely. So for non-technical people like myself, what are we talking about when we say machine translation, is that like Google Translate? What are we talking about there?

BS: Google translates a form of it.

CC: Okay.

BS: But essentially, yeah, it’s a programmatic way of translating from one language to another.

CC: Okay.

BS: It’s been around for quite a while and we see it commonly in Google Translate and other online uses, but it’s actually been around for quite some time.

CC: Okay. So I know that as AI has become the biggest topic in 2023, we’ve often compared it to machine translation. I know we’re going to talk about that throughout the episode, but can you give just a little intro to why they’re compared so often?

BS: Yeah, I think it boils down to really where machine translation started.

CC: Okay.

BS: So I’m not going to give you years because it’s not at the top of my head, but it basically started out as a rules-based program. So people sat down, wrote these if then else statements essentially, to basically say, if you come across this phrase, then it’s translated in this way for this language.

So they started out with that rules-based approach, and they’ve beefed up the rules and they’ve beefed up the processing. And of course, they improved the examples on the backend of the finished translation and modified that so that the translations kind of became a little bit better over time.

CC: Okay.

BS: Then they switched over in many cases from the rules-based to more of a machine learning model, which then, basically it’s like early AI. So it started to learn patterns and it started to learn about context a bit based on the words and phrases that were being used and could draw additional inference from that.

CC: Interesting.

BS: And essentially that started to develop more and more until we got to an AI use case. So it’s something where you actually get this robust use of machine translation. So it’s actually using a lot more learning models in that translation process. And the machine translation process is a little odd because you can do it out of the box, so something like using Google Translate where it basically uses its own Google index as a resource for translating a lot of that content. But a lot of translation companies and a lot of companies that employ machine translation, whether they are translators or not, some companies do it in-house on their own. They will basically train their machine translation against their own content and their own translated store of content so that it brings back their approved wording, their approved language models.

CC: Gotcha. I can totally see how that is a big parallel to AI right now as we’re talking about having an internal AI versus just throwing content in ChatGPT. That makes sense. You mentioned there was a transition into machine learning. When that happened, how did people react? Was it really similar to how people are reacting with AI? Was it split? What was that acceptance like?

BS: Yeah, I think there were some parallels there. Just as with what we’re seeing with AI now, there’s a lot of concern from people saying, oh, the machine is going to essentially rule, make my job obsolete because it can now write these blog posts, it can write these screenplays, it can develop these characters, it can produce these images. But with machine translation, there was that similar kind of fear where translators were like, oh, it’s going to reduce my margin. It’s going to put me out of a job. But we haven’t really, we saw that to some extent in the very beginning, but what we’ve found over time is that no, the people are still required to go in, proofread that machine-translated content, clean it up, make it more appropriate, and essentially improve what’s on the backend that the machine translation is pulling from so that things are improved over time.

CC: Yeah, process updates and that kind of thing. Improving the bank of-

BS: Right, improving the phrases, getting rid of things that are no longer said in certain areas because language is ever-evolving.

CC: Yeah, that’s true.

BS: You need to be able to keep up with those changes.

CC: That’s true. And how far ahead would you say that machine translation is compared to AI? Is it five years in the future so we can maybe see what might be coming? I know that’s probably really hard to quantify.

BS: Let me get my crystal ball.

CC: Yeah, yeah, there we go. Give us an exact answer.

BS: I’d almost say that they’re on two parallel, but different paths.

CC: Okay.

BS: And that I think we’re going to see a lot more blurring of the lines. Those paths are going to start to come together a little bit more. I mentioned that machine translation is leveraging AI to a good degree these days because it’s the next step in that form of machine learning. It’s no longer a core programmatic learning model, but it’s more of an adaptive one. So it basically will influence its own way of learning about stuff going forward. AI is employing machine translation to many degrees. We saw there was a video floating around LinkedIn of this new utility where, and I think Sarah spoke about it on a previous podcast with Stephan Gentz. But yeah, you basically record yourself saying something and it will turn around, machine translate that content, use your tonal voice and basically re-speak, and then re-sync the video so that it looks like you’re speaking a completely different language.

CC: That’s crazy.

BS: It’s nuts. I watched the video a few times. I don’t know either languages. I think they used French and German. I know enough German to be dangerous, and I know enough French to order a meal.

CC: It’s the priority.

BS: But the German I found was actually pretty spot on from what I could understand of it. And I know Sarah speaks pretty much fluent German, certainly more than I do, and she only found really one mistake, I think.

CC: That’s crazy.

BS: It’s crazy. So there are cases where things are being employed, and I think we’re going to see a lot more of that.

CC: Okay.

BS: On the machine translation side, we’re certainly going to see it adopting more robust AI models so that it can continue to build and improve how machine translation is being done. On the flip side, I do think that AI will be leveraging more of the linguistics modeling that is baked into machine translation so that it can do a better job of representing essentially the human construct of language.

CC: Wow. That video example that you gave and that Sarah shared before, that’s just, I feel like that’s one of those examples that I don’t know, 50 years from now, we’ll look back and the kids will be like, that’s so used to, I don’t know, stuff like that is what they’re totally used to, or I remember back in my day that was a big deal, anyways, it’s just mind-blowing that this kind of stuff is happening. So speaking of those kind of innovations and industry disruptions and that kind of thing, we’ve talked a little bit about how with machine translation and then with AI kind of on parallel paths merging together, what are some of the ways that the disruptions have been really different or have created different things for the content industry?

BS: I don’t know if there’s any real difference in how they might be disrupting the industry or how they might be employed. There are differences from a practical matter, when would you deploy a translation management system versus when could you use… Well, AI is kind of a really nebulous term. It could mean anything. It could mean ChatGPT, it could mean image rendering software. It could mean really anything. With machine translation, we’ve seen it become more of a daily utility. So you come across a news article in another language, if you’re using Google Chrome, you might have the option to translate this page. If you’re not using Google Chrome, you can go to, for example, translate.google.com and just provide it the URL or copy and paste a paragraph, and you can basically get an idea of what that website’s talking about. But we’ve seen it become more baked into applications as well. Certainly, there’s a whole industry around providing translation services. So we’ve seen that kind of pick up the pace on round-tripping translation work.

So before you would have someone sit down and actually translate a block of text and they would use translation memory, which is essentially a store of what was translated last time, to kind of pull from and pre-fill the translation, and then that way they can fill in the gaps. That’s a very, very high-level view of translation memory, but essentially it takes that to the next level where it will pre-process the translation for you and provide you with something that’s maybe 95% there. And then you would get someone who is an expert in the language and the subject matter and the target locale, because we know that Spanish is different depending on where you are in the world, for example. And they would proofread it, clean it up, and probably commit that back to whatever the machine translation is using so that it uses that reference next time rather than having to go through that again.

We see it baked into applications as well. So there are some gaming applications that will auto-translate a chat on the fly so that depending on, no matter where you are in the world, you can actually still understand what these players are saying. So if you’re on a team and someone’s saying, go now, and you don’t speak their language, you have no idea what they’re saying.

CC: Yeah.

BS: But the chat translation can kind of help. It’s not perfect, but it’ll help. With AI, I kind of see that moving into a similar role. It’s going to be, now we’re looking at it as, oh, look what this thing can do. It can write me a limerick. It can essentially create me a photorealistic image of whatever I choose to think up. I give it a description and it creates something, and it might be what I’m looking for, and it might not, and there are flaws to those as well. But I kind of see AI being baked more into the backend of a lot of the tools that we use on a daily basis to help with more robust search and query activities, to be used as an editor or a checker for things on the backend, to be a starting point for developing something new.

So whether it’s a piece of code or in our world where we do structured authoring work, it could be something as easy as give me a framework for a new task that I need to produce, and it will lay everything out. That kind of harkens back to more of a template, but you can kind of say, give me a task based on what I’m writing about here in this section, and it can pull some pieces in and fill things out. So I also see it as being more of an aid for finding resources that already exist so you’re not reinventing the wheel and things like that. So things that essentially it’s going to be baked into a lot of different utilities that some of which we use now, some of which we haven’t thought of yet that will make our lives easier.

CC: Okay. So what are some of the pitfalls that we fell into during machine translation that we can avoid with AI? Do you have any red flags or things to watch out for based on how things went the last time, essentially?

BS: Yeah, I think the biggest one is to not take what it provides for granted.

CC: Okay.

BS: So regardless of whether you’re talking about machine translation or you’re talking about AI produced whatever, is to not run with whatever it provides without giving it a thorough check. I think the other thing that we’re seeing though, and it wasn’t so much an issue with machine translation, is more of a concern around copyright and ownership.

CC: Okay.

BS: So who essentially owns the rights to these things?

CC: Yeah.

BS: And it kind of goes back to, well, what was the model used to kind of create them in the first place? Was it using a public domain model or was it something that was trained only on a private store of information?

CC: Yeah. So looking into the future, do you see private AI being maybe the best way to move forward with AI? Not that people will necessarily, or there’s maybe some use cases for public domain AI too, but do you see that though as more of where we’re going to head?

BS: I think it’s inevitable. I think that we’re going to have cases where, we’ve seen cases already where companies have kind of uploaded examples of their own code to see if they could get a public AI model to write more code based on that model. And unwittingly, they basically let their own IP out into the wild, so now everyone can use what these people created that they uploaded in the first place. So that’s an oopsie. So I think that based on cases like that, I think people are going to start employing a private model, basically a walled garden where they can train and develop their own corpus of information, whether it’s images, code, text, what have you, and use that to produce things using AI.

But I still think that, yeah, there’s going to be a public model for, I don’t see that need ever going away. Just as we have public models for everything else that we use on the internet, I think we’re going to see AI have its own footprint there as well. We might need to be careful while using it. There might need to be more guardrails attached, but I don’t see that going away.

CC: Yeah, that makes sense. And you mentioned that with concern for people’s jobs, I know that of course is a concern right now with AI as well. And you mentioned that at the beginning of machine translation, that was, you did see a little bit of job loss, but overall, those experts were still needed to manage the content, make sure that everything that is being created is accurate. So what would you say to people that are really concerned about that right now? Do you think that’s going to be really similar for AI? Are there any differences you can think of?

BS: I think this is actually a good learning point from machine translation because yes, some people lost their jobs initially when machine translation came out. I think in hindsight, that was an error or that was a bad decision to either let people go saying, oh, a machine can just do it. Because it was very clear out of the gate once machine translation really started being used that people are still needed. They’re still needed to clean up what the machine translation is producing. They’re still needed to do new translations into new markets in new contexts with new terms. A machine just can’t invent things and have it be correct for a very specific target audience.

To do any kind of translation correctly, you need to know the subject matter. You need to know the language that’s being spoken in, the flavor of language for the locale in which you are targeting that content, and anything else about that locale that might influence jargon or anything else that might need to be employed. So yeah, I see a similar warning, I guess, for people who are looking at AI and saying, oh, we can reduce our staff by employing AI. It’s like, no, you’re going to augment your staff and they are going to need to learn new skills because they are going to need to learn how to leverage AI to produce basically more and better work. It’s a utility, it’s not a replacement.

CC: Yeah, I liked how you phrased that. I think that that’s a good perspective for employers, for writers, for anyone who is worried about the job climate right now, I think that’s a good way of looking at AI.

BS: And as of right now, we know that AI is being used to generate articles on the web. There are a lot of websites that are using AI to just basically pump out post after post after post, article after article after article. And you can tell immediately once you start reading it that it was not written by a human.

CC: And at the end of the day, it’s still humans connecting with humans. So whatever content we’re putting out there, it needs to be valuable to people that are reading it. It needs to have a purpose, it needs to be doing something. It needs to just be humans communicating with humans. So those big content pumping blog posts, all that kind of stuff, that does bother me because it’s just content for the sake of getting content out there. And there’s humans at the other side that actually need information. So I think this is a really good perspective to have for how to leverage AI in the same way that we’ve leveraged machine translation, how to automate processes, how to have a starting place for people when you’re writing, but not to just make it all about machines and not people. So Bill, is there anything else that you can think of when you’re thinking about machine translation? Any other comparisons between that and the rise of AI? Anything else that you wanted to share before we wrap up today?

BS: I’d say approach it both optimistically and cautiously.

CC: Yeah, that’s really good, especially with the concerns that you mentioned about copyright. We do have an article that Sarah O’Keefe wrote and recently updated as well about AI and the content lifecycle. So we’ll post that in the show notes. Also, some other interviews and information that we’ve provided about AI. So all of that will be linked in our show notes. And Bill, thank you so much for joining the show and talking about this today. I wasn’t in this space while machine translation was happening. It’s really interesting to hear about the parallels because they really are very similar in a lot of ways, and it’s cool that we have some takeaways from both.

BS: Thank you.

CC: Thank you for listening to the Content Strategy Experts Podcast, brought to you by Scriptorium. For more information, visit scriptorium.com or check the show notes for relevant links.