Architecture overview
This section describes the features that make the DITA architecture especially interesting.
Topics as basic content units
The default content unit in DITA is the topic. (In DocBook, it’s the book.) Topics are assembled to create deliverables, and you can reuse topics in as many deliverables as necessary.
By default, DITA ships with four topic types: task, reference, concept, and the basic topic. You can also add additional topic types. This emphasis on information typing results in a library of content that separates procedural, conceptual, and reference information.
The topic-oriented architecture requires that authors create modular, self-contained information. For content creators who are accustomed to working on cohesive books, this can be rather a difficult transition.
One topic (sorry!) of heated discussion is the issue of “glue text,” the content that provides coherent transitions from one topic to another. Some argue that glue text is unnecessary and that transitions are overrated; at the other extreme is the opinion that modules without transitions are unusable. If you belong to the latter group, keep in mind that implementing transitional text in DITA is quite difficult. Transition text that makes sense in one context might not be relevant in another.
Reuse with map files
DITA provides two major reuse mechanisms: maps and references. DITA maps provide a list of links, in a particular sequence and hierarchy, that describe the content of a deliverable, as shown in the following example:

<?xml version="1.0"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map title="Zoo Policies">
<topicref href="Animal_nutrition.xml">
<topicref href="Aardvark.xml"/>
<topicref href="Baboon.xml"/>
<topicref href="Crane.xml"/>
<topicref href="Dingo.xml"/>
</topicref>
<topicref href="Visitor_behavior.xml">
<topicref href="Adults.xml"/>
<topicref href="Children.xml"/>
</topicref>
</map>
Map files are similar to FrameMaker book files, but offer more flexibility—a map file can contain references to other map files, and topics can be nested more than one level.
Typically, you would create a map file to support each major deliverable. Thus, components of the Zoo Policies shown in the preceding example could be reused in an Animal Care Guide. The information about animal nutrition might be provided in both documents, but the discussion of visitor policies would appear only the policy manual. Multiple map files can reference the same topic as necessary.
Reuse with content references
For reuse inside a topic, DITA provides a content referencing mechanism. A bit of content that will be reused carries a unique ID. In this example, the <note> element is set up with an ID for reuse:
<topic>
<title>Aardvark</title>
<body>
<p>Aardvarks eat mostly termites.</p>
<note type="danger" id="nofeeding">Do
not feed snacks, scraps, or people food to the animals.</note>
</body>
</topic>
You then create a reference to the element you want to reuse by specifying the file name and the ID in a conref attribute:
<topic>
<title>Baboon</title>
<body>
<p>Baboons eat mostly fruit.</p>
<note type="danger" conref="aardvark.xml nofeeding"/>
</body>
</topic>
One difficulty with conrefs is that they are quite tedious to create and manage manually. If you plan to use conrefs, consider how well your potential authoring tools support content creation and management.
Conditional content
The DITA architecture provides support for attribute-based versioning. That is, if you have some content that is intended only for some deliverables, you label the content with the relevant attributes:
<topic audience = "internal">
<title>Secret Settings in our Software</title>
</topic>
<topic>
<title>Setting up Your Project</title>
</topic>
<topic audience="external">
<title>Controlling Your Files</title>
</topic>
When you publish content, you use a settings file called ditaval to specify which information to exclude from the final output. In the preceding example, a deliverable for internal users does not require a ditaval file because internal users get all of the information (internal, external, and all audiences). For external users, you need to suppress the internal-only information, so you set up a ditaval file as follows:
<val>
<prop att="audience" val="internal" action="exclude"/>
</val>
You can also set up more complex conditional processing with intersecting attributes. For instance, you could produce the Linux-specific version of a document for external users by excluding the internal and Windows-specific information, as shown here:
<val>
<prop att="audience" val="internal" action="exclude"/>
<prop att="platform" val="win" action="exclude"/>
</val>
You will need a ditaval file for each combination of conditions, or you can create some additional processing to create a custom ditaval file when you are ready to publish your content.
Customization
You can customize in three different ways:
- Subsetting
- Specialization
- Extension
Subsetting
Subsetting means that you remove extraneous elements. For instance, consider the standard <body> element definition in DITA:
(p or lq or note or dl or parml or ul or ol or sl or pre or codeblock or msgblock or screen or
lines or fig or syntaxdiagram or imagemap or image
or object or table or simpletable or required-cleanup or section
or example) (any number)
After reviewing your content, you decide that you do not need the highlighted elements. You remove those elements and create a new, shorter definition for body:
(p or lq or note or dl or parml or ul or ol or sl or lines
or fig or imagemap or image or object or table or simpletable or
required-cleanup or section or example) (any number)
By subsetting, you create a smaller, more manageable content model, which in turn makes life a little easier for your authors. Provided that the elements you remove were optional in the original structure, the subsetted structure is still valid DITA content. The subsetting process creates a smaller set of allowable structures within the original DITA universe.

Specialization
Specialization is a unique design feature provided by DITA. According to the DITA Specification:
“Specialization allows you to define new kinds of information (new structural types or new domains of information), while reusing as much of existing design and code as possible, and minimizing or eliminating the costs of interchange, migration, and maintenance. Specialization is used when new structural types or new domains are needed.”Specialization provides a partial solution to the customization problem. If the provided standard doesn’t meet your requirements, you can often specialize an existing element. The various components of the DITA toolchain (including the output processing tools) can process the specialized element based on its class. Without specialization, you would have to make extensive changes to accommodate every new element you add.
If the element you need to create does not have a reasonable equivalent in the basic DITA structure, specialization may not work. When you specialize, the new element must use the structure of the parent element or a subset of the structure. You cannot specialize an element and then create a structure that is looser than what is permitted in the original element.
The DITA specialization mechanism provides a partial solution for a long-standing limitation of DocBook and other standards: if your content requires you to diverge from the published standard, you quickly end up with a custom implementation. You can restrict a content model by using only a subset of the provided elements and stay within the standard. If, however, you need to add elements, you end up with a structure that does not conform to the standard, as shown in Figure 3.
DITA supports caution content using a note with a type attribute:
<note type="caution">
Because cautions and notes use the same element (<note>), you cannot allow cautions in locations where notes are not allowed or otherwise differentiate among caution and note content. If you need to treat notes and cautions differently in the structure, you need separate elements. If you simply add new elements, such as <warning>, <caution>, and <sidebar>, you end up with a structure that is no longer DITA-compliant.

If you specialize instead, you can add new elements and still have content that is DITA-compliant, as shown in Figure 4.

For specialized elements, Open Toolkit processing uses the default processing of the parent element (that is, <note> for the <warning> element in the specialization example). As a result, you can extend the DITA element set without also having to modify the output processing templates.
You can, of course, also add customized processing for your specialized elements. If, for example, you want to output the information in the <sidebar> element into a traditional sidebar in your printed documents, you would need to modify the output processing files to provide that feature.
| NOTE: | Attribute specialization is not supported in DITA 1.0. |
If you use DITA’s specialization mechanism, you can use a generalization process to transform your specialized content back to generic DITA content. Generalization is useful when you plan to exchange information with another organization that does not use your specializations.
Extension
If you cannot build the structure you need with subsetting and specialization, you need to consider extending DITA. When you extend, you create a structure that no longer conforms to the standard specification (see Figure 3). This means you have to extend the entire toolchain to support your customizations.
The DITA community, in general, frowns on extension. They recommend using the specialization mechanism to maintain interoperability with other DITA content. However, starting with DITA and making customizations could be less time-consuming than building the structure from the ground up.
Generating output
XML is generally not appropriate as a delivery format, so content must be published to HTML, PDF, CHM, and the like. The effort required to configure publishing streams is significant and may involve multiple sets of tools and technologies. For example, creating plain HTML from XML requires someone who at a minimum understands Extensible Stylesheet Language Transformations (XSLT), HTML tagging, and Cascading Stylesheets (CSS). Output to PDF/print is even more complex, requiring knowledge of XSL-Formatting Objects (XSL-FO), structured FrameMaker configuration, knowledge of the E3 publishing engine from Arbortext, or a similarly complex toolset.
The DITA Open Toolkit provides a set of transformation files for several types of output. This gives DITA-based authors a starting point for creating their final output. Configuring and customizing the Open Toolkit files is not trivial, but it’s probably easier than building the entire publishing workflow on your own.
Several commercial tools, including Arbortext Editor, XMetaL, and structured FrameMaker, provide helpful integration of DITA output based on the Open Toolkit.
Next page:
Implementing DITA versus implementing
custom XML architecture
Copyright © 2008 Scriptorium Publishing Services, Inc. All rights reserved.
