Random thoughts about publishing

icon Site Feed

Labels

Palimpsest has moved. Please visit our blog in its new location for the most recent posts from Scriptorium.

Palimpsest

 

Next-generation web publishing with Mark Logic

Wednesday, June 20, 2007 — posted by Sarah

Jason Hunter, Principal Technologist at Mark Logic

This presentation is about distilling a set of trends from his experience in implementing at many publishing companies.

Mark Logic has an XML content server. Big-picture feature set is to load, query, manipulate, render.

This is interesting, but have you looked at the future of the book presentation yet? (I really don't want anyone to miss it!)

Trends
The business situation:
"When you buy a plane, you get more paper than plane."

They have a 200-terabyte deal with a government customer.

Some cool things you can do online: Charles Darwin's The Origin of Species with image of original printed book, side-by-side with a web-readable version.

Technology situation
People want be to one step away from answers, not two. Some examples from Google. Another reference to O'Reilly's Code Search tool, which he showed yesterday in the lightning demo session.

PathConsult web site designed for physicians.

Lots of examples of bells and whistles you can do with XML-based content being served up on the web.

There are two ways to make money: create more content OR do more with the content you have. Guess which one is more appealing?

Example: microproducts
Focusing the delivery onto a specific genre or subtopic. Mark Logic did work for Oxford University Press in developing an African-American studies web site. They aggregated relevant content from many different OUP publications.

Example: custom publishing
Pulling together single chapters from many different books. SafariU lets you do this.

I saw most of these examples in a presentation (at XML 2006, maybe?), but it's still interesting.

Content in context
Example: finding mention of a drug in the conclusions section of an article along with terms such as "contraindications," "should not be given," and/or "may cause death."
Example: electronic flight bags for pilots. Primary purpose is real-time access to weather, but also to find procedures faster
Example: historical content, such as Congressional Quarterly

Emphasis on Google
Fear: Owning the content and the user
Opportunity: better search
Opportunity: Instant AdWords registration
Opportunity: Personalized landing pages

Basically, work with Google. You probably can't avoid them. But if you have the source (XML) content, you have the "negative" of the picture. Google only has a printed 4x6 snapshot. Interesting analogy.

Take advantage of your expertise in your vertical to get AdWords before anyone else figures them out.

User participation can be direct, like blogging, reviewing, and tagging. But there's also indirect participation where search and guidance is provided by collective intelligence. This is the Google model.

Leveraging the structure
You can provide more powerful search than Google because your XML is better than the downstream HTML that Google is indexing.
You can automatically generate master indexes for a collection of your books.
You can generate "special views"; that is, multiple output fofrmats.
Historical analysis lets you look at who is citing a particular document over time.
You can print actual books faster. (This is, of course our major focus.)

Generate a dumbed-down press summary from an article in the New England Journal of Medicine. Basically, figures, the content referring the the figure, the paragraph in the table of contents that describes the article, and, crucially, letters to the editor in later editions that refer to that article.

What if you don't have structure?
People add it. Sometimes by an editor, sometimes with automation.
Need to mark up important information, such as people, places, companies, drug names, and the much more.

Government customers are more often focusing on structuring content automatically because they have such vast volumes of information.

Structure in fiction: What if you could identify characters, plot points, and locations, so that you can, for example, search Tolkien's volumes for a certain character's appearance? (I think the argument for structure in fiction is weak, but certainly for fiction that people study, there's a possibility.)

Content analytics
What is the shape of the "haystack"? Example from O'Reilly...309,000 pages and 123,439 blocks of code examples. Much better than file-based management or siloed content.

Agility
Must implement fast. (Not sure I agree with this. The consequences of making a mistake can be quite unpleasant.)

The most difficult task in implementing at O'Reilly was gathering up the content and undoing the various shortcuts people had taken.

The cheapest way to get structure into the content is to off-shore the conversion. (That begs the more interesting question or whether it's feasible to get authors to create structured content in the first place. For technical publications, the answer is clearly yes. For book publishing, most publishers haven't tried to push XML-based authoring onto their content creators.)

Labels:


2:01 PM Permalink | |

<< Home

divider


Scriptorium Publishing | Post Office Box 12761 Research Triangle Park, NC 27709 | (919) 481 2701 | info@scriptorium.com