Full transcript of death of PDF podcast

Sarah O'Keefe / Podcast transcriptLeave a Comment

secretary bird

00:00 Alan Pringle: Welcome to The Content Strategy Experts podcast, brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize and distribute content in an efficient way. In episode 17 we discussed PDF. In a previous podcast, we talked about the death of training. So, now it’s time to discuss the death of PDF.

00:28 AP: Hi, everyone, I’m Alan Pringle. I’m chief operating officer at Scriptorium Publishing, and I’m here with Sarah O’Keefe.

00:34 Sarah O’Keefe: Hello, everyone, I’m CEO here at Scriptorium.

00:38 AP: Is PDF dead? Is it dying? Is it still in high demand? That’s the kind of thing that we want to talk about today.

00:45 SO: And since everybody really enjoyed hearing about the death of training, in which we concluded that training was not in fact actually dead, we thought we’d use that same theme here. And I guess we’re going to have an ongoing series on death of things in tech comm.

00:58 AP: Things. Yeah. Yeah. So, let’s talk about the advantages of PDF. It’s been around for a very long time and there are some reasons why. Let’s cover those up front. I think that’s only fair.

01:12 SO: Right. When PDF came along in the… What? Early ’90s?

01:18 AP: Yeah. ’93-ish somewhere around there.

01:18 SO: Yeah, I think that’s about right. It, initially, was a replacement for delivering, for example, PostScript files to the printer, which in turn were a replacement for delivering camera-ready copy, which is to say physical pages that the printer would take a picture of and then turn into copy and that kind of thing. Or turn into a printed document. So, PDF comes along which has the great advantage that you can get complete page fidelity. You can look at it and see what it’s going to look like, which you never could really do with PostScript. You can control the presentation and then, almost immediately, it went from being the sort of print intermediary to “Hey, we don’t have to print documents anymore. We can just deliver PDF and shift the responsibility and the cost of printing those documents onto our customers.”

02:13 AP: Yeah. I remember getting CDs, the install CDs for software. And a lot of times, instead of having it printed, user manual inside the box, you would have a PDF file that they had put on a disk or sometimes on the floppy.

02:27 SO: Right. Because back in the days of boxed software, and here we are showing our age again, the actual printing of the documents, of the user guide, the install guide, whatever, was a significant chunk of actual money.

02:32 AP: Yep.

02:43 SO: And then you have inventory problems because you’d print the docs and then find a mistake, and need to either throw those out or add errata pages, or do various kinds of things. So PDF is much easier to update in the sense that you can just send out a version 1.01 and say, “Delete your current PDF. Use this one instead.” So those were all, and are all, huge advantages that PDF offers over paper, over PostScript. And so those are all big pluses.

03:13 AP: Yes. Now, there are also some negatives. And I think those negatives have kind of amplified themselves over the years, especially with the advent of websites and web content because we shifted from this, “Let’s to go to print… ” To, basically, a soft version of the book. And, “Okay. Let’s just throw that soft version of the book up on the web and make the customers download it from there, instead of a physical disk.”

03:39 SO: Right. “We’re not even going to give you the PDF on a CD-ROM.”

03:43 AP: Right.

03:45 SO: “We’re just going to put in on the website and make you download a 20 megabyte or a 100 megabyte PDF.” So, they’re huge. But even that, as long as it was sort of a print replacement, I think that was sort of okay. The problem arose when the web itself became the way that we want to consume information. I don’t want to download a PDF, open it up, wait for it to open and look at stuff. I want to just get the information I’m looking for.

04:14 AP: It comes to this 24/7 connectivity world that we live in where we say, “I want to find this one little bit of information and not have to look on page 45 of a 60 or 70 page PDF file.” And a really good example of this, and this is one of my pet peeves, when I am traveling and I have my phone or my iPad out, and I want to find out what the local restaurant has for dinner, lunch, whatever, I will go check out the menus. Inevitably, I end up downloading a scanned PDF of their menu. And on my phone, that is not exactly the ideal way to find out what that restaurant has ’cause I’m pinching and I’m enlarging and whatever else, trying to see what they have. And that really is not ideal and, I think, is one of the big weaknesses of putting PDF out on the web.

05:17 SO: Right. So, first, you have to download a big file over terrible hotel Wi-Fi. And then you have to attempt to look at this huge page on a small screen. And then the third thing, which doesn’t really apply to your example, but the… PDFs are not very good at search or at search engine optimization. So if I’m looking, as you said, for a particular piece of information that’s on page 500 of this huge PDF, I give up long before that PDF actually downloads, and now I’m really angry with the person who produced it. That then, brings us to this question of, does that mean that we should kill off PDF?

05:55 AP: Well, that sounds, maybe, good on the surface in theory. But there are several reasons why that probably won’t work. And one of those is some people still like to print things. Some of your customers who want your content, they may like the idea of, “Yeah, I want to print this little part out and carry it with me.” So, a lot of times, your customers, and you want to call them old school or whatever, they like PDF, they want to continue to use it. And in that case, I don’t think you can just arbitrarily say, “You know what, PDF is ancient, it is not useful anymore. We’re just not going to do it.” You better poll your customer base very carefully before you make that kind of decision and just snatch it and say, “No, we’re not doing it anymore.”

06:45 SO: So then what’s the solution here? You don’t kill off PDF, you keep delivering it, but how do you solve all these other problems?

06:52 AP: Well, I can think of one of our clients, in particular. They came up with a very good solution for this. They knew that searches were a problem with their giant PDFs. And some of these PDFs are thousands of pages, they are enormous. So, they realized, “We have to break this up in smaller HTML chunks.” But at the same time, they knew that some people still wanted that big PDF. So, what they did is they said, “You know what, we’re going to kind of turn the focus on the different outputs and we’re going to say the HTML channel, the HTML output is now the default.” So you go search on Google or whatever site to find, and you hit one of their web pages that will come up. When you get that web page, you then have the ability to download a PDF of that publication. You can also download an EPUB eBook. So their idea was, “Yes, we’ll still give that PDF to you, but we no longer consider it the default output on the web.”

07:54 SO: And I think it’s worth pointing out that in most authoring environments, you can actually deliver both PDF and HTML from the same content. So, you don’t really have to make the choice as an author. You don’t have to say it’s either or. You can actually deliver both. Now, most tools are going to be biased towards one or the other. So, they’ll do better print or PDF, or they’ll do better HTML. But you can still get decent PDF out of an HTML first tool and decent HTML out of the PDF first tool. You have that choice. I was in a presentation not too long ago and somebody said to me, “Well, you keep talking about HTML delivery and how that fixes SEO and fixes the mobile problem, fixes all these things.

08:46 SO: But I know we have customers that need PDF. So, what do we do?” And I was kind of appalled to discover that, apparently, I hadn’t said, “Well, do both.” I just launched into this, “This is why you need HTML.” But I never said the words, “And you can also keep delivering PDF,” which is, typically, what our customers end up doing. We’ve had a few that have ditched entirely and said, “We’re not doing that. You can print HTML pages if that makes you happy.” But most customers with product content are saying, “Okay, we need both.”

09:20 AP: Yeah, multichannel, omni-channel, whatever slang you want to use to describe all those output paths, it is not a either or, or must do one situation, it is… You could have many out there and customers in different context may want one format over another. Another example I can think of is if you had technicians that are out at a site that is, say for example, underground where there may be no internet connectivity, web pages are not going to be helpful to you. A PDF downloaded to a device could help in this situation or an EPUB. There’s context involved in here that you have to think about about how your customers are going to use that information.

10:04 SO: And that’s actually… That’s an interesting point because, I think, one of the things that we… I don’t know about overlook but maybe de-emphasize when it comes to PDF, and you just mentioned EPUB, is what they do is they encapsulate a particular document. So something like a user guide or an installation guide. When you get a PDF, you know you have the entire thing. And the same thing is actually true with EPUB. It gives you a book, a document. When you go looking at HTML pages, they are intentionally broken down and it can be quite difficult to understand the context like, “What is the scope of this entire document that I’m pulling this little HTML piece out of?” So there’s some advantages there from just a reading point of view.

10:56 AP: Right. Because what you just described, generally, you are either going to use that company’s search engine or Google or some other search appliance to hit that particular piece of content. And the context was, “I type these words in a search.” You really can’t do that with a collection of HTML pages on a device, it just doesn’t work. That’s why the EPUB wrapper, like you mentioned or PDF, is helpful because it’s basically self-contained.

11:25 SO: Yeah. Now, one of the really, really big problems with PDF has to do, not so much with the PDF itself, not the form factor but with the kinds of workflows that tend to go into that. We’ve seen a couple of cases where companies that were delivering PDF only have run into problems where their consumers, and usually this was internal customers like support people or field technicians, like you mentioned, had created these private stashes of PDFs that they had downloaded. This is sort of like, “I need to keep up with this particular PDF, so I’m going to download it and carry it around with me on my tablet.”

12:08 SO: And then two things happened. One is, when that PDF was updated, they weren’t told and they didn’t go looking for it. So, now they have an out of date PDF, which is highly problematic because, presumable, the updates were like, “Do the installation this other way.” Or, “Here’s a new version,” that kind of thing. So a private stash of out of date PDFs. And then what tends to happen with those is that they would pull out the procedures they used the most, actually scrape them into something like HTML or a Wicky or even just a text document, and then rewrite them to meet their requirements. So, they were actually correcting the content with what happens in the real world.

12:53 SO: Not only, now, do they have an out of date PDF, but they have a PDF that they have actually… Or content that they have actually revised without telling the people that, supposedly own that content. And this is actually a pattern we’ve seen repeatedly with PDF only workflows. And the basic reason for it was that the internal customers couldn’t push changes back and they couldn’t get changes made quickly enough. So they went off and fixed it on their own.

13:21 AP: It was a one way street, essentially.

13:23 SO: Yeah. Which is terrible, because now all these individuals have private knowledge. And the authors, the content creators don’t know about this, aren’t being told. And now I, as a field tech, am reluctant to make updates because… Well, I’ve already made the updates. I don’t want you to overwrite the stuff where I annotated my PDF. So this is really potentially a bad problem having to do with needing to close that feedback loop, and think about what that needs to look like.

13:57 AP: You…

13:58 SO: Sorry, go ahead.

13:58 AP: You just touched on something that made me think of another use case where PDF is required. And a lot of government agencies require PDF, particularly when we’re talking about safety in regard to pharmaceuticals and things like that. So a lot of times you have government agencies saying, “You will provide this content in this format.” And PDF is often that format.

14:23 SO: Right. And so then you have to look at this question of, “Can and should I deliver it in some other format? And how am I going to do that?” A lot of the tools that output, print and PDF will also do HTML. I did want to touch on InDesign a little bit because InDesign… It can do EPUB. But getting HTML out of InDesign, that’s really not what InDesign is designed for.

14:52 SO: Yeah. So if you’re doing… If you’re in a PDF only workflow that lives inside InDesign and you need to start delivering web content, then I think you’re probably looking at a tool change. And so we’ve seen this over and over again with industrial data sheets and those kinds of things where groups are using InDesign, for reasons that escape me, because they’re not doing highly designed content. They’re doing very structured, very rigorous content on a page, which is really not what InDesign is for. But that’s what’s there and that’s what they’re using. So that’s a case where we typically look at a changing process. And where does XML come into all of this? Is that a requirement?

15:38 AP: Well, it definitely feeds into it because one of the great benefits of XML is that it enables you to do multichannel publishing. Now, that’s not to say… Like you said, there’s some desktop publishing tools that are pretty adept at putting out different types of outputs. But XML, I think, frees you up even more. You also get into some reuse and some other aspects, too, that XML may do a little bit better than some desktop publishing type applications. But no, it definitely feeds into the whole “I need PDF. I need HTML.” Well, XML, Extensible Markup Language. You can extend that content and create all kinds of different outputs from it. That’s part of the extensibility.

16:26 SO: Right. So if you only need basic PDF and basic HTML, then you… You could look at XML, but it’s not the only option. But if you need a variety of HTML and just a lot of different formats that you’re putting out, then it becomes… The more formats you’re producing, the more compelling XML becomes as a foundational technology, basically.

16:49 AP: And the more languages that you’re producing those PDFs, those website, etcetera, that will also drive… There are a lots of business decisions that will feed into whether or not you move into XML and multichannel, omnichannel, whatever you want to call it, that kind of publishing is a factor. But it is, absolutely not the only one.

17:10 SO: So, once again, PDF is not dead and not dying.

17:16 AP: It should be in some context, however. Menus. Restaurants, quit doing that, please. I beg you, please.

17:23 SO: And especially scanned PDF ’cause that’s just evil.

17:26 AP: Yeah.

17:27 SO: We also haven’t talked about accessibility. And there are some real problems with PDF, especially in the context of, “We scanned something on a page.”

17:34 AP: It’s a giant picture, essentially.

17:36 SO: Yeah. Which is just a nightmare. But PDF is, I think, no longer enough, right?

17:42 AP: No, it’s not.

17:45 SO: And I think that’s… Is that where we leave it?

17:47 AP: It is. And I think we may have to do a sequel to this a few months, a few years down the road to discuss where PDF is now because the technology may change. Who knows. We’ll see.

[pause]

18:04 AP: Thank you for listening to the Content Strategy Experts podcast, brought to you by Scriptorium. For more information, please visit scriptorium.com or check the show notes for relevant links.

About the Author

Sarah O'Keefe

Twitter

Content strategy consultant and founder of Scriptorium Publishing. Bilingual English-German, voracious reader, water sports, knitting, and college basketball (go Blue Devils!). Aversions to raw tomatoes, eggplant, and checked baggage.

Leave a Reply

Your email address will not be published. Required fields are marked *