Skip to main content
November 30, 2017

Full transcript of cancer staging podcast

00:01 Gretyl Kinsey: Welcome to The Content Strategy Experts podcast, brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way. In episode 18, we discuss faster content, better healthcare, and the role content interoperability plays.

Hello and welcome to The Content Strategy Experts podcast. I’m Gretyl Kinsey and I’m a technical consultant with Scriptorium. And today I have a special guest with me. So, go ahead and introduce yourself.

00:32 Laura Meyer Vega: Hi. I’m Laura Meyer Vega, I’m the managing editor of the American Joint Committee on Cancer, Cancer Staging Manual, as well as the project manager for the American Joint Committee on Cancer, and the de facto content manager for the Cancer Staging Manual content as well. And the AJCC, American Joint Committee on Cancer is housed in the American College of Surgeons in Chicago, Illinois.

00:56 GK: And we are broadcasting actually from LavaCon, Portland 2017. We’re actually both in the same place at the same time which is unusual ’cause with Scriptorium I’m typically based in Durham, North Carolina and Laura you’re…

01:10 LV: In Chicago.

01:10 GK: Chicago. So it’s unusual but good that we’re both here. And we’re both here because we were actually presenting a case study on a really interesting project that we got to do. So, that’s kind of where I wanted to start with my first question is let’s try to just do as brief and un-technical as possible…

01:28 LV: I’ll try. [chuckle]

01:29 GK: Overview of what we did.

01:32 LV: I’ll give a little background on what cancer staging is. AJCC writes the rules for… When a person says something like, “I have stage four cancer, and I have one year to live.” We write the rules, the biological rules that go into determining what an advanced cancer is or late-stage cancer if you will. And that information also is used to prognosticate how long a person is expected to live over a period of five to 10 years, or less or more based on population statistics. And cancer staging is the foundation of the documentation and measurement of all of this information, and it feeds a cycle of treatment. Patient’s treatment plan is determined based on their cancer stage, and their eligibility for clinical trials is based on their cancer stage, as well as surveillance of public health issues like cancer. Cancer is a major public health concern. And the incidence of cancer is recorded across the nation, across the world, and used to measure the rate of cancer over populations. And then finally, that information is analyzed and used to justify or rethink the cancer staging process all over again. And so, AJCC just published its eighth edition of the Cancer Staging Manual which we’ve been publishing for over 40 years now.

02:58 GK: And so, we had a unique problem that Scriptorium worked with the AJCC to help solve with that Cancer Staging Manual.

03:05 LV: Right. For the past 40 years and more, we’ve only delivered this content in a print book, and in recent years, and when we published our last edition, the seventh edition in 2009, we had a lot of feedback from physicians and data collectors that this information… And let me state here too that we stage over 100 cancers, 100 different distinct diseases and all of them are very unique. And it’s getting bigger and bigger as we learn more about disease and human biology. And so what clinicians, and data collectors, and software developers who were trying to create tools for clinicians and data collectors. We got a lot of feedback from them that the book wasn’t enough. That it was difficult to go to a desk resource to recall this information when they had electronic medical records at their fingertips and they could find this… Or they wanted to find this information at the point of care in their electronic systems. In around 2010, we embarked on this mission to see how we can make our content both useful to humans and machines. So that the people who were building tools based in machines, electronic medical record software, were getting the content in a format that they could easily subsume. And that our content would be pulled from a single source of truth, so that there was no more human interpretation of what’s in a print book.

04:35 GK: So this really led to the idea of getting into structured content with DITA XML, so that’s kind of where Scriptorium came into the picture.

04:45 LV: Right. We heard from a lot of developers that XML was the way to go and specifically DITA XML because we primarily wanted to publish this content, and self-publish this content at some point, but we needed that infrastructure, and DITA would also serve the purpose for software developers as well.

05:03 GK: And this brought us to an interesting solution, that I’m going to go over this as briefly as possible. We do have a case study that’s published, it’s a joint effort with Scriptorium and easyDITA. We both worked with the AJCC on this. So, I’m going to let that be what has all the details and I’ll just try to go over as quickly as possible for the purpose of the podcast. But what we ended up doing was creating a DITA specialization that semantically captured all of the different information in the Cancer Staging Manual, and in particular, the factors that are used to diagnose cancer, and that required a very unique DITA specialization.

05:41 LV: Right. Those factors that are used to derive stage T, N, and M, Tumor, Node and Metastasis and some other non-anatomic variables like blood test results, things like that, I won’t get too specific. All of that information compounds permutates to derive a cancer stage from least advanced to most advanced. And it’s that information that software developers in particular were very interested in because they needed to build calculators for their physicians, and needed to build forms for physicians and other data collectors at the hospital level. In specializing this specific information, we were able to create an API that would pull information from these specific tags that could be piped to software developers.

06:29 GK: And I’ll back up a little. Just for those of you who may not be familiar, specialization is the ability to take default DITA elements and create custom elements from those, basically using the same structure. And what that allows you to do is have specialized semantically named elements so that you can get a lot more value out of the different elements that you have, and you can actually capture some of the naming. So for example, with the cancer staging, you can actually have an element that tells you, this is the definition of T, N, and M, instead of just having a kind of standard element that doesn’t tell you what’s inside of it.

07:08 LV: Right. The biggest thing we needed to establish was context around each of our elements, and then specialize those tags so that we could pull from within a context. Our content is primarily divided between narrative text and tables. And definitions for specific data elements that are used to derive stage are primarily presented in tables. And to make our tables unique and to let our users be able to call these specific definitions, these specific tables from an API, Gretyl helped us by specializing these table elements. I think you went from a simple table element the…

07:51 GK: We did, yes. We took the simple table element in DITA, and all of the kind of sub elements within it for the rows and the cells. And we created specialized versions of that, so that you have for example instead of an element called simple table, you have something called stage table. That’s the table that contains all of the staging rules for a particular disease.

08:15 LV: Yeah, the combinations. When T is this and N is this and M is this, then the stage is this.

08:22 GK: And what that allows the API to do is instead of just seeing a simple table and saying, “Okay, well this is just an element called simple table, that doesn’t tell us anything about what’s in that table.” When they see a table element that’s called stage table, and they can see what disease that’s a part of, then it knows that’s the information that needs to be called out.

08:45 LV: Right. And on top of that, in building our API, we created what we call a staging elements API, that would pull the specific things that a software developer would need to drive to the appropriate content. It starts with the disease, the disease name, the anatomic location, and then you get your T, N, M definitions and your stage groups. And they can use these bits and pieces to build the tools that they need instead of us AJCC, imposing a workflow on their customers. That’s one thing that we didn’t want to do because the EHR, the Electronic Health Record companies, their customers are unique to them. And not one hospital is alike in their workflow. So we wanted to give the software developers the ability to build a workflow that met their customers’ needs and not impose one.

09:41 GK: And how can this overall help lead to improvements in diagnosing cancer cases?

09:47 LV: Well, let’s go back to the original reason we got into this. When software developers were interpreting content from the manual itself, the staging rules weren’t necessarily inherent, and still aren’t necessarily inherent, in the way the tables are represented in each chapter. There’s a lot of nuance that’s described in the narrative text, and in the initial General Rules for Staging chapter or chapter one in our book. And so what the developers were doing, and actually a lot of medical specialists as well and data collectors, were assuming that the main information they needed was captured in those tables, but they were missing those nuances. And so that translation from the page to software was introducing an opportunity for human error. And this human error had the possibility of affecting the accuracy of the data collected on a patient. And not just the data collected on the patient, but the information surfaced to a physician or a clinician when they go to enter information into the medical record.

11:00 LV: So there was an opportunity to compromise the actual information being captured about a patient. And maybe their… It could be their clinical treatment would be compromised online, I don’t think any of this happened but their clinical treatment. That surveillance data I mentioned earlier that’s collected on populations, information from databases and published journal articles on clinical trials. All of that information is used to feed our research on how to improve cancer staging. We wanted to eliminate this problem as early as possible in the process, and we found that our best way to do this was to give accurate content, explicit content in the way that those software developers could consume it, and eliminate that need for human interpretation.

11:51 LV: And just one other point about the human interpretation, there are a number of electronic medical record systems out there. And I think about 5,000 hospitals in the US alone, and many of them treat cancer patients or see cancer patients. And they all use different medical record software. So you have developers opening a book and coding something for a specific hospital or a doctor office solution, it’s that number of people who could have different interpretations of what the context of our staging rules were. And it’s 2017, and we have the responsibility to eliminate that as much as we can.

12:32 GK: Right. And so this solution just helps improve consistency and accuracy across the board.

12:38 LV: Yes. And hopefully down the line, all of this information if it’s accurately collected and accurately reported, and patients are accurately treated, this only helps support changes to cancer staging. Maybe a patient who appears today with a late-stage disease, maybe there’re better treatments that could be applied to their disease, and their outcome becomes much, much better. And down the line a patient, maybe five years from now or 10 years from now who maybe presented with a stage three disease, with the research and the clinical trials and the treatments that are applied to that stage three patient, a patient presenting with the same disease five or 10 years down the line, maybe they’re a stage two or a stage one, and the outlook isn’t so dire for them.

13:30 GK: Exactly. And that would be really a marked sign of improvement. Really to show that this has actually made a difference and helped people.

13:38 LV: Right.

13:40 GK: So another thing I wanted to talk about as well, and this is where I think you were going, but looking ahead to the future, what kinds of goals are you looking toward on this project as we go toward the ninth edition of the Cancer Staging Manual?

13:57 LV: I just touched on the patient care goals. And this process of treatment and surveillance and analysis, it might help the world change how patients are treated. Some patients might be overtreated right now, which isn’t good for them, they might be under-treated right now. And I hope we get to find a balance in the near future on the appropriate treatment for patients. For the business aspect of it, and I don’t want to say we’re a business, we’re a non-profit, I think we could get a lot of efficiencies in how we develop content, and there’s definitely an opportunity to clarify content. Our authors are all volunteers, we had about 450 contributors to the manual from all over the world. We had 83 chapters in the manual and written by different experts, on different body parts, on different organ systems within the body. And they all have a different way of speaking and a different way of writing. I think one of the next steps in the next two to three years is to examine the actual content, the narrative content, and make it a little bit more straightforward, so people don’t skip over that content and go straight to the tables.

15:14 LV: This book has evolved over a number of decades, and because there are so many different unique voices in it, each chapter in past editions were sometimes unique in their own content. There were very similar topics and headings among all of the chapters, but some had unique stuff like an FAQ document, or a big discussion of one particular part of a disease that was more of a dissertation and not necessarily a part of a chapter; it was maybe better for our website or better for a journal article. And so as part of this project, we came up with chapter templates, where there was the minimum required content for every chapter and clear descriptions of what those topics should address. And that helped us really far along the way to get us to component content management.

16:08 GK: Because that kind of template that Laura is describing is what we needed for Scriptorium to develop that specialization, to get everything into a structure like DITA. Content that’s in an unstructured format can have an implied structure, so you could have a consistent set of headings and things as Laura was saying. But when it comes to actually putting it into something like DITA, that’s an enforced structure, so the more consistent that that writing can be on the front-end when it’s still unstructured, the more straightforward it is to take it over into an enforced structure like DITA.

16:41 LV: Right. And from the human perspective as well, something we learned at this conference just a couple of days ago at the LavaCon Conference, was that the human attention span measured by Microsoft in 2002, was 12 seconds. And 12 years later in 2014, it had got that same… They used the same methodology to measure human attention span and it had gone down to eight seconds. Based on how people were consuming content, with multi screens, multi tabs open, all this multitasking that we’re doing because we get content from every direction. And the same goes for clinicians and for cancer registrars. For every human that has to interact with our book, the time and attention span isn’t there any more to pour over all of this new content.

17:31 GK: And that actually ties a lot into the angle we took when presenting the case study which is faster content. We have to think about that attention span and how to make sure we’re keeping up with that because as Laura said, eight seconds. One thing we learned is that that’s actually one second shorter than the attention span of a goldfish. So that’s the kind of challenge we’re working against now.

17:52 LV: I do think we retain more than goldfish.

17:54 GK: I believe so.


17:56 LV: I hope so at least. But yeah and I think that attention span shows just exactly what our readers were telling us. That they were going directly to these tables, which are a lot more visually appealing and visually absorbable than our narrative text which gave them all of the context.

18:15 GK: Another future consideration that we’ve talked about is going to different outputs other than just the API. And that gets into one of the big benefits of DITA which is the ability to have a single source of truth, and to centralize all of your content in one place and then just publish to all of the different types of outputs you need from that single source. So this is kind of the goal for you, right?

18:41 LV: Yeah. It’s the ultimate goal. We do want to self-publish at some point. We couldn’t do that with this edition, we just didn’t have the resources to self-publish. But for now, our content management system feeds the API, and our primary customers of the API, actually our sole customers of the API I believe are software developers. And the content in our content management system is somewhat different from what’s in our actual textbook. We have 83 chapters in our book, but 104 diseases. There’re some chapters that cover three distinct staging paradigms, staging pathways. I’ll get specific. The larynx has three anatomic sites, the supraglottis, the glottis, and the subglottis. And they have different T categories. And so each of those have a unique staging pathway in the API. And so I think we can start to work from those unique staging systems when we go to author again, or to revise content moving forward with our subject matter experts. And then hopefully print organ system specific, like the gastrointestinal tract that we can publish smaller modules of the Cancer Staging Manual. Because right now it’s 1,000 pages. Medicine is so specialized right now that not every doctor is going to want a 1,000 page manual when they only treat lower gastrointestinal cancer. We want to be able to give modular content to our customers.

20:16 GK: And that also feeds back into the idea of that human readability and attention span that you’re basically serving up just what’s needed and not all of this excess, so that there’s not that need for them to have to spend their time and their attention filtering through unneeded content.

20:31 LV: Right. And first step was just getting this big book into smaller chunks. And in the future we can plan our business model around smaller modules for our customers, our readers.

20:47 GK: So I just want to also ask you what all else you’ve learned while you’ve been here at LavaCon?

20:52 LV: A few years ago when we first started dipping our toes in this concept, and I say a few years ago I mean like five or six years ago. I can’t believe how much time has flown by. Everybody was talking about Content 3.0, about content being broken up into chunks and being subsumed by multiple users. And now the whole thing is around Content 4.0 and how content can be even in smaller chunks, and your consumer, you user, your customer actually becomes a conducer, someone who takes your content and then builds from there. And that’s exactly how I see our customers of our API, they’re conducers. They take our content and build tools off of it. I’ve known that that’s a thing, now there are terms around it.

21:37 GK: Exactly.

21:37 LV: The other part, or the other discussions that I’ve heard, and we’re still as an organization way far out from this. And we’re not really… We’re not as big as some of these companies who use Chatbots and things like that. I think this micro content and web support, I see a future for that in some of the products that we offer, but we’re a really small organization and we have a small constituency. And I think that human element is going to need to be there for a while. But the stuff that’s been clicking through my head is we need to start thinking of our content and turning it on its head so that we are anticipating customer questions, and we have responses to that. Or even documenting customer questions so that we’re giving them consistent responses. ‘Cause that’s something I don’t think we do very well now when we don’t have a ticketing system of any kind. There’s basically two people who answer these questions. I answer content structure questions and my colleague Donna Gress answers the scientific questions. And she has to reach out to physicians to answer more detailed questions. But we really need to start cataloging some of these responses because if we expect our customers to consistently use this information, we need to give them consistent support and consistent answers. I don’t think we’ll get to a Chatbot but some of those principles behind Content 4.0 could definitely help us in our endeavor.

23:09 GK: Absolutely. Well thank you so much Laura for joining me and for having a lot of fun at LavaCon with presenting and learning all kinds of new things. I think it’s been a really interesting and excellent conference and great experience for all of us.

23:24 LV: I regret not going to karaoke.

23:27 GK: I regret you not going to karaoke too.

23:28 LV: I know. [chuckle] Guys if you ever go to LavaCon, do not pass up on karaoke even if it’s 9:30 at night and you have the sleeping patterns of a 60 year old woman when you’re only 35. [chuckle] Just go. Yeah. If you ever get a chance to come to LavaCon, do it and take entire advantage of it.

23:49 GK: Well thank you again, and we will have links in our show notes especially to that case study. And thank you all for listening.

23:57 LV: Thank you.

24:00 GK: Thank you for listening to the Content Strategy Experts podcast brought to you by Scriptorium. For more information, please visit or check the show notes for relevant links.