The devolution of DITA editors
Most of the DITA work that we do at Scriptorium is “full-on” implementation. That is, our customer decides to move their content from [something that is not DITA] to a DITA-based system. There are variations on the theme, of course, but nearly all of our customers are concerned about managing localization costs and increasing content reuse.
Perhaps this is why I find the recent flurry of “limited DITA” editors so puzzling. Companies are producing DITA editors that are marketed as easy-to-use, “Word-like” or even embedded inside Word, and generally less challenging to use than a dedicated XML editor.
The trade-off is that these editors do not support all of the DITA tagset or architecture. For example, I recently saw an announcement for Codex and downloaded it to take a look. Codex is an AIR application, so that’s fun.
In the first 30 seconds, I noticed the following:
- Codex lets you create generic topics. You cannot create task, reference, or concept topics.
- I don’t think you can create codeblock elements, tables, or conrefs.
In short, although what you can create is technically valid DITA, it’s an extremely limited subset of DITA. Let’s call it babyDITA.
Here’s my question: Is there actually a use case and a potential customer base for editors that create babyDITA? If you’re going to restrict the supported elements to this level, why not just let your contributors author in (X)HTML and convert it to DITA later?
Meanwhile, there is also the category of “Word-based” DITA editors. (I’ll try to set aside my aversion to Word long enough to consider this approach.) These editors, such as Quark XML Author and Simply XML, are marketed as software that “lets anyone easily create XML in Word with no knowledge of XML and little or no training” (Quark) or “without worrying about mark up or learning complex new software.” (SimplyXML) These companies are trying to use Word as a sort of Trojan horse for XML—the authors get to keep authoring in Word, but now they’re creating DITA (or other XML) in their favorite (?) word processor.
I think there’s a market in there somewhere, but I don’t think it’s as big as these companies think. Implicit in the “keep authoring in Word” argument is that putting authors in a different tool is too hard or too expensive. It’s true that XML editors are more difficult to use, but this is only in part the fault of the tools and their relative lack of maturity. The second problem is that creating properly organized, high-value content is much harder than putting crap on a page. Word definitely encourages the latter. So even moving up from creating typical Word documents into Word-based DITA is going to be a huge leap. If you don’t understand the underlying DITA structure, you’re just going to continue to create badly structured information that is technically valid.
If your organization needs high-value content—information that is useful, accurate, and reusable—you will need to invest in the appropriate tools, technologies, and process. If you don’t need high-value content, you should continue doing whatever you are currently doing and ignore DITA and its brethren.
Finally, I think that the “Word-based” editor people are solving the wrong problem. The future is in the cloud with web-based editors, and you might want to take a look at two interesting products there: XOpus and easyDITA.
What are your thoughts on babyDITA editors? Do you see a need for them?
Not sure about babyDITA but there is a need for something that helps non-professional technical writers to collaborate or submit content to a larger body of work. This could be done by supplying those folks with DITA-based templates, through a wiki, or by possibly using a subset of DITA in a WYSIWYG-simulated environment that later gets morphed into full-bodied DITA by professionals.
Something so limited that you can’t even create tables? I don’t see a need for it, but someone did.
Thanks for this post!
I think this sentence is the key:
“If you don’t understand the underlying DITA structure, you’re just going to continue to create badly structured information that is technically valid.”
I’ve seen this happening whenever people use the WYSIWYG views of XML editors (e.g. XMetal) – babyDITA editors are even worse than this. I mean, if you cannot even create a task, why write in DITA at all?
In the end an author needs to understand and write in XML if he/she wants XML’s benefits – this is what I believe…
Wow, I was feeling like the lone ranger on this having learned (incredulous!!) that people will do this–implement DITA sans information types (everything’s a “topic” and you can only use certain elements), being more interested in their style sheet than information management or architecture. I guess it’s fine if you’re creating things that are out of the “reuse pool”, but then why bother? It’s not DITA, not even babyDITA, it’s a mutant, an impostor.
I’m very interested in easyDITA; we’re envisioning a model where the initial set-up and authoring is either done by writers and then the SMEs turned loose on the topics, or the SMEs learn to use templates that the writers create.
To me, there’s no sense implementing DITA at all if you’re not going to reuse and type information, which is where the real value is.
As Julio said, the use case is so that we can make use of content that’s crowd-based or community-sourced (pick any term you like) without having to reformat it to make it play nice with DITA, or with any other form of structured authoring.
I agree with you that the likes of XOpus and easyDITA are well positioned to succeed, and to be very useful tools (which, alas, isn’t always the same thing as succeeding) in the future.
I agree, Larry and Julio, that there’s a use case for crowd-based information. But if you’re going to author babyDITA, why not just author XHTML instead? I don’t see a lot of value in babyDITA. And I think it’s a hard sell to Joe the Average Contributor.
We should call these devolved editors D’A (prounounced “duh”), since they took the information typing out. 😉
We are very excited about the potential for DITA at the enterprise level. DITA and XML are only a means to better communication and we agree with you that there is no place, with or without DITA, for “crap on a page.” High end Word-based tools from Simply XML and Quark allow organizations to create structured content at the source rather than through conversion and re-writing which is expensive and hundreds of millions of people are already comfortable using Word. With an XML architecture, these tools appropriately constrain the authoring options while preserving the familiar Word user interface. No matter what tool authors are using, though, they need to understand how to write so that information consumers benefit. We hope DITA will become a standard and that the complementary Word-based, browser-based, and high end editors will all contribute meaningful content that improves business results.
Ah, my trackback got here before I could! I had to respond, of course, but I had to start over on my blog because I had more to say than would fit here. What the trackback above says…
And now we come to the part where I disagree with Don Day about something related to DITA (gulp). In his post (see trackback in #8 above), he says:
The point I’m trying to make is that I don’t think this is going to happen. High-value content needs structure and semantics. Low-value content doesn’t. And I think that for in-between content, XHTML is probably good enough. Furthermore, what’s the real difference between XHTML and Julia’s D’A standard? Seems as though if you take out the more sophisticated DITA elements and the reuse architecture, you’re basically left with XHTML. I do not believe that DITA, or D’A, is going to become a widespread authoring standard for content in the enterprise.
I should also clarify that the two Word-based editors occupy a different space in that they support more (most? all?) of the spec. Here, the logic is that people like working in Word, so these tools will accommodate them. I happen to think personally that this sounds like hell, but apparently millions of people disagree with me. 🙂
I understand your doubts about the case for lower-level DITA beyond ID, Sarah. Here are some additional things that shape my thinking:
1. These lower-level tools map very well to Level 1 of the DITA Maturity Model (http://dita.xml.org/wiki/level-one-topics). Michael and Amber point out that semantic richness might be lost, but even so, there are gains over XHTML: separation of presentation from content, rigor in the authoring tools, modularity, reuse, multiple outputs, conditional views, full processing interoperability with other DITA in the company, and more. These benefits alone start to pull even simple DITA ahead of XHTML.
2. Eliot Kimber’s DITA for Publisher’s project has given him some insights on DITA’s use by writers who are just creating content for publishers. At the end of the day, semantics get stripped out at the FO step, and his tool can happily ingest any quality of DITA for production. This recap doesn’t exactly say that, but it does list some benefits that accrue for publishing regardless of the level of DITA: http://justwriteclick.com/2010/02/25/dita-for-publishers-with-eliot-kimber/ .
3. Bernard Aschwanden has been an early and vocal supporter of DITA subsets (http://www.publishingsmarter.com/resources/books-and-articles/subsetting-dita4000). While I’ve had reservations about the cleanness of the subsetting approach to reducing DITA complexity for casual writers, I can’t argue with the satisfaction of his clients. The constraints feature in DITA 1.2 finally gives consultants a way to model the features in a schema to provide the benefits that Bernard lists.
4. Finally, tying back to your closing comment about Word-based editors: The IBM Semiconductor DITA project, which you can find videos of on the Web, happens to be one proof point for the Word/Quark-based solution space. I saw the usability assessments for their target audience of engineers, and the architects did their homework–I can’t fault their choice.
All of which has brought me, bullish as I am about the full capabilities that DITA can provde, to accept that this is a movement that I want to help have some legitimacy, perhaps with a defined “profile” of elements and behaviors that represent a useful and reasonable stretch goal for the implementors in this space.
And for the sake of DITA’s future beyond Tech Pubs, I believe that these previously untargeted users are likely to represent some new communities of use for Heavier DITA. How will we know unless we provide them tools that can expose them to the DMM ROI benefits in the first place? It’s a vision, not a certainty, but we’ll see how it shapes up.
Laurens van den Oever
Thanks for your interesting post Sarah. I see the same trend: authoring tools being dumbed down for sake of ‘usability’. I don’t think there is a use for babyDITA. However, I do agree with Don Day and see value for XML/DITA authoring tools that focus on a larger audience than Tech Pubs. That exact need was the reason we created Xopus many years ago.
One of the roles I see for friendly XML/DITA authoring tools is side by side with the dedicated XML tools for the tag wrestlers like XMetaL, Arbortext and Oxygen. Tech Pubs do not live on an island, they have to work and exchange information with engineers, subject matters experts and native language reviewers. The few friendly authoring tools that do support the full DITA set can make these groups part of the DITA workflow and completely eliminate the need to convert content (or comments in PDF or on paper).
You do have to be aware that different authors have different skills. And you’re right, the casual contributors probably do not have the skills to create well structured DITA by themselves. But they do when we help them. Tool configuration can make choices for them like allowing text in a section or only in a paragraph or which elements you want to expose to them.
WYSIWYG helps them to create valuable structure in their knowledge domain. I’d rather have an engineer start an accurate specs table that may need a bit of editing than having an endless conversation between the engineer and the tech writer about accurately representing a specification.
And you don’t need to expose the whole structure, only the elements and parts a particular author can give valuable input on. For instance the tech writer may insert index terms, but those are no use to the casual contributor, so you hide them. They must remain in the content to support the workflow and allow roundtripping between different DITA tools, but there is no need to expose the full complexity of the content to the causal contributors.
> It’s true that XML editors are more difficult to use, but this is only in part the fault of the tools and their relative lack of maturity.
I think the difficulty to use XML Editors is completely to blame on their lack of maturity AND on the lack of effort that goes into configuring them for different audiences.
Anyone can edit XML with a good tool that is configured to expose only the editing options relevant for their knowledge domain.
I haven’t looked at the babyDITA tools, but what I wonder is – how reusable is this content? While I see a benefit in a simple tool for SMEs to author in (imagine if the functional spec describing a feature were written in reusable DITA?) what I wonder is:
1 – How often are these internal documents actually USABLE as-is? Very seldom in my experience, so direct reuse is out of the question.
2- How often is marketing collateral reusable as-is? Probably more frequently than internal documents, but again, that’s mostly marketing to marketing reuse. Marketing to techPubs is less-frequent as the audience/content needs differ, even in the tone/voice of high-level overviews. That said, a DITA TECHNICAL marketing department could be very powerful, but that power/reuse would benefit more from a full DITA implementation (and the folks who write those technical marketing docs can easily pick up the complexities of DITA. They don’t need babyDITA).
So babyDITA does solve some of the conversion pain, but it’s still a rewrite in my neck o the woods, so no reuse, no localization cost savings etc.
One thing I’d like to clarify: I didn’t really say that babyDITA tools were justifiable for crowd-sourcing, but that someone felt there was a market for those tools. Indeed, if all you’re doing is creating generic topics (like I do when I write a novel) and don’t care about the semantic elements, XHTML is the better way to go. If that’s to become part of other information in a production stream, those topics can be integrated with other XML-based content fairly simply. (With DITA you just state that the included content is HTML; I’m not sure how DocBook or some other XML dialect would handle it because I haven’t dug that far into their mechanisms.)
Should everyone be writing in DITA? Probably not. A lot depends on the purpose of the content and how it fits into an organization’s processes. Do Word-based DITA editors have a place. Yes, but it’s important to develop an architectural strategy to integrate that information with the rest of the product stream and those folks who are using that tool must understand that they are providing input and not necessarily final output.
(Yes, I’m using DITA for my next novel just to get the benefit of multiple outputs from a single source.)
This is an interesting discussion. The real problem everyone is trying to solve is how to get people to write better AND more structured content. Structured content isn’t inherently better, it is just easier to reuse, translate etc.
The babyDITA editors are trying to wrap xml structure on the input without enforcing structure on the author. In the end you don’t really have DITA content. You have text with a bunch of xml tags around it. You haven’t helped people create better, more structured content.
But getting people to learn new tools, new workflows and new ways of writing is hard. It involves change and people don’t like change. We don’t currently support DITA with our authoring tool (mainly because our customer base in not requesting it) but we have found a way to get our customers to be successful using DITA-like practices. Our tool basically focuses on creating how-to lessons that are very similar to DITA task topics. We have people with no training in technical writing or information architecture creating hundreds of task-based topics in a modular fashion.
Here are the keys to how we have helped our customers do this:
1. Enforce structure. We don’t try to tell you you can do anything with our tool. You can’t. There is a structure that is enforced. Our customers tell us that is one of their favorite things. It forces them to focus on the task they are documenting.
2. Make it simple. If someone needs to take a course to use realize any benefit from the authoring tool then they aren’t going to adopt it. We focus on helping our users create and share great content in the first 15 minutes of using the tool. This instant success gets them coming back and using it again and again, creating more structured topics.
3. Give them a framework. Saying something should be structured isn’t really helpful to an author. You need to give them a framework to work within. We tell our customers to write down specific customer questions and then answer them with individual topics. Without realizing it, they create great task-based topics because they have a simple workflow that makes sense to them.
So simplicity is important, but so are results. I don’t think the Word-based editors will ever work because they don’t enforce structure. The high-end ones won’t work on a broad basis since they introduce too much complexity. It will be interesting to see if someone can find the right balance.
BabyDITA is the wrong direction, and I agree that these trimmed-down DITA implementations are the wrong direction as well. The tools vendors doing this should be spanked – hard – for these limitations. But, there is a very real market for “idiot-proof” DITA tools. Not every content creator (I’m not even going to use the title “writer”) is a content technologist, and nor should they be or be expected to become one. Most people just need to get their work done and move on. BUT, there’s also a huge need for managing the content they produce. Using stripped-down tools that are easy to use or familiar make it possible for these people to get their work done AND positively contribute to a greater content strategy.
What these tool vendors need to do is consider the full DITA model and allow content strategists/managers to customize them for the specific needs they’re filling.
I think babyDITA is aimed at individuals outside of the technical communication industry, or more specifically at the non-Tech Comms tasked with managing or purchasing tools for authors.
Short on time and resources, overwhelmed with tools that look like code editors, and too embarrassed to admit that the monkeys chained to typewriters have more tech-savvy, getting DITA in a comfort zone like Microsoft Word looks very appealing.
It’s win-win. You successfully avoid change and the authors get that DITA thing that they said would reduce the authoring costs – right? Kinda? Not really.
Great post Sarah! (As usual.)
Let’s all keep in mind that DITA is just a file format. A very rich and powerful format, but fundamentally just a place to store content. Sure, it was designed by a lot of very smart people with certain concepts of reuse and content modeling in mind .. but those are just features that this format supports, along with other more mundane features like indicating which bits of content should be rendered as bold and which are lists.
Just because DITA offers certain features doesn’t mean that everyone should use them. Just because you can tag a topic as a “task” or “concept” doesn’t mean that everyone should do that. You should use only those features in this format/model that will benefit your process or workflow. If you’re doing things that you’re not actually benefiting from, you’re wasting your time. For that reason, I think that Baby DITA (in its many forms) can be the right solution for many people.
In the strictest sense, I’d imagine that most people are using Baby DITA, since very few people are really taking advantage of everything that DITA offers. What’s the difference between one person with a high-powered full-featured editor that compiles with the full DITA specification, but only creates simple topics with minimal tagging and no reuse, and another who is using a lightweight editor that doesn’t fully support the specification. If these two people are essentially creating the same files is one person’s workflow or toolset any better than the other?
Baby DITA is not the same as XHTML. Yes, the underlying structure and tagging may look almost identical, but if it’s a “real” DITA file, you can use the plethora of tools available for creating output. If you’ve got XHTML you’ve got a different set of options for publishing, but not near as powerful. Also, DITA created from one of these lightweight editors can be integrated (without modification) into a set of other DITA files (regardless of their origin) .. XHTML has no such mechanism.
I think that there is definitely a place for these lightweight editors. As long as I’ve been in this field people have struggled with integrating content from Word files into FrameMaker based books. It never really worked well, but people continue to try to do it. The reason for this was that most of the company had access to Word, and only the tech writers had the more pricey (and complicated) FrameMaker. Baby DITA is the “Word” to DITA’s “FrameMaker” .. but now, those simple documents written by non-techincal people and SMEs can seamlessly integrate into the larger documentation set.
I recently learned about Codex as well, and as with you, saw right away it’s serious limitations .. but I was also very excited about this tool because I see that it has great potential. First .. it’s an AIR app, and I’m an AIR geek, so that was a bonus (OK, not terribly valid), but because it’s AIR, it can install on Mac and Windows (Linux too I assume, but they don’t list that as being supported). Second .. this is a standalone desktop app. As far as I know there are no other lightweight DITA editors that fall into this category. All of the others I’ve seen (and I may not have seen them all), are either browser based or enterprise, both requiring greater expense and support. Third .. I’m assuming that this is just the first pass at a feature set, and they will be adding more element and feature support in future updates, but I really like the general usability and simplicity of this editor.
The thing that I think is very important is that regardless of the level of DITA that’s supported it should be valid and adhere to the specification. So far, most (including Codex) have some flaws that need to be fixed. In many cases, if it were me, I might have waited a bit to release a product .. but that’s not the software model any more.
I won’t personally be using any of these editors, because I like the power that a full featured DITA editor can provide (even if I don’t use more than 10% of the features provided). But I do see that many people just need to write down some simple information and don’t need or want to be encumbered with information modeling or reuse. There’s no reason those people shouldn’t also be storing their content in DITA as well.
Thanks, Sarah, for initiating a discussion that can feed the growth of all editors whether they are full-featured or babyDITA as you described. The article and comments offered many good points and I hope to contribute a few more:
• Your diagram illustrates a critical problem facing organizations today. Typical business documents (including but not limited to techdocs) must move into the right-hand quadrants of the diagram to support knowledge management, process automation, dynamic publishing, and other executive initiatives. Organizations view it as untenable that their most valuable business assets currently are bound within the crap-on-a-page quadrant.
• To get there, organizations must change the way people write—which you point out is a significant hurdle. We hear organizations asking whether they need another barrier to adoption by introducing a new tool, regardless whether that tool is designed for technically-adept full-time authors or a lightweight tool targeted to casual contributors. The “Trojan horse” metaphor is very fitting. We hear a universal chorus of “why can’t we just do this in Word?” which is why Quark adopted the Word platform to begin with.
• The discussion of babyDITA seems to describe a tool that implements a fixed and limited subset of DITA. I think this is too restrictive to describe all of the tools mentioned in the post and comments. There are also what I would call “XML Word Processors” that attempt to completely hide markup and simplify the user interface by implementing technical XML constructs as word processing constructs. In this approach, the schema (DITA or other) is simplified only when the business case requires a subset of the schema.
• As Scott pointed out, everyone only needs a subset of DITA—but the subset is different for different users. We have found that the subset changes with the industry, organization, or even department in question. So inherent in providing a tool that is easy to use is the ability to support different combinations of DITA’s various features. Experience says that providing too few tags to support user requirements can introduce as much complexity as providing too many.
• The problem is not whether we ask users to follow a semantic structure or even to add semantic metadata to a topic. Semantics are not inherently complex; they are only complex when they are not relevant to the user’s area of domain expertise. We have found that non-technical users can (and will) properly apply attributes and inline markup when they are relevant. DITA created in a word processor does not have to be equivalent to dumb DITA.
• We do think this is a very big market because the imperative for what many are now calling “intelligent content” is so universal that it will demand a solution. We believe that DITA has enormous potential for all types of business documents and that there will be a place for many types of editing tools to meet different needs.
Director, XML Products
Co-Chair OASIS DITA for Enterprise Business Documents Subcommittee
Great article Sarah.
You’ve hit upon one of the real conundrums in this industry. People talk about the richness of DITA topic types and then end up limiting themselves to standard topics only. They want full-featured, DITA aware, specialization aware editors, and then use standard DITA DTDs that shipped with DITA 1.0.
Okay, all that aside, the real reason I’m leaving you a comment is to highlight your chart’s excellent axis labels: “Crap on a page” and “Just get it out the door”. Well said!
Great comments, everyone. I think we’re all in agreement that the content shift shown on the chart needs to happen.
We disagree on the specifics of how to get there, or whether I’ve positioned the bubbles correctly.
I look forward to finding out whose analysis in correct in about five years. 🙂
One goal of Codex is to enable any kind of business to more efficiently produce documents by utilizing DITA’s reuse features. There are a wide range of business documents that can benefit from reuse, but do not require a high level of structure (for example, contracts).
Another goal of Codex is to enable organizations to more efficiently channel information into their existing DITA infrastructure (such as SMEs contributing information to technical writing departments). Granted, there may be a trade-off between higher efficiency and lower levels of structure, because Codex does not support highly structured content. But the priority for each organization will be different.
Even Codex’s limited features are enough for some organizations to achieve the above goals. But keep in mind that Codex is still a 1.0 product released just one month ago. Periodic updates will include additional features (including update 1.1 coming later this month).
Feedback from DITA professionals like you ensures that we add important features sooner rather than later. Even complaints and criticisms are helpful! In not so many words, Sarah, you’ve requested that we add support for Codeblock, Tables, and Conref to Codex. Great! They’re already on our radar, but these kinds of comments help us adjust our priorities. Keep ’em coming!
Co-founder and CEO
@Adam, thank you for your gracious response. I doubt that I’d be as nice to someone picking on MY product. I look forward to future developments in our efforts to eradicate crap-on-a-page.
You should have seen my first draft 😉
We’re about to start a pilot with easyDITA. We have a lot of (attempted) collaboration and reuse around some SDK content here, with the “official” version in DITA already, single-sourced/filtered for different operating systems. Another department has reused (copy and paste, of course) content from that in attractive Word documents for customer use.
I was impressed with the simplicity of the easyDITA UI for locating and editing a topic, and the collaboration aspects of it looked very appealing, even fun–effectively, it compressed the writing and editing activities into one session. It will be interesting to see how it’s received here but at this point I can’t think of an easier foray into component content management.
The web-based editor, I’m hoping, will hide the DITA-ness from the subject matter experts, so they can concentrate on putting their product knowledge into one repository, and we’ll have writers edit/curate/apply metadata so we maximize the value of our content and start tracking with the maturity model. This is what I’ve been waiting for: pure DITA that masquerades as a Word doc. 😉
I’ve already tried and failed to get non-tech pubs people authoring with the WSYWIG (Author) view of oXygen. 😉 It was like Milo in Catch-22 and the chocolate-covered cotton. (If you haven’t read Catch-22, you really must.)
It’s hard to believe that it’s been almost two years since your original post, Sarah… probably because the landscape of DITA editors hasn’t changed dramatically since then. But Codex has gone through a major evolution, and, for the record, Codex 2 now supports all DITA elements, with the goal of making DITA authoring as fast as possible for occasional users.