Table of contents

Abstract

What is structured authoring?

What is XML?

The impact of structured authoring on a publishing workflow

Workflow options

Roles and responsibilities

Developing a business case for structured authoring and XML

Does your organization need structure?

Implementing a structured workflow

Summary

What is XML?

Extensible Markup Language (XML) defines a standard for storing structured content in text files. The standard is maintained by the World Wide Web Consortium (W3C). (Detailed information: http://www.w3.org/XML/ )

XML is closely related to other markup languages, such as Standard Generalized Markup Language (SGML). Implementing SGML is an enormous undertaking. Because of this complexity, SGML’s acceptance has been limited to industries producing large volumes of highly structured information (for example, aerospace, telecommunications, and government).

XML is a simplified form of SGML that’s designed to be easier to implement. (SGML vs. XML details: http://www.w3.org/TR/NOTE-sgml-xml-971215) As a result, XML is attractive to many industries that create technical documents (including parts catalogs, training manuals, reports, and user guides).

XML syntax

XML is a markup language, which means that content is enclosed by tags. In XML, element tags are enclosed in angle brackets:

<element>This is element text.</element>

A closing tag is indicated by a forward slash in front of the element name.

Attributes are stored inside the element tags:

<element my_attribute="my_value">This is element text.</element>

XML does not provide a set of predefined tags. Instead, you define your own tags and the relationships among the tags. This makes it possible to define and implement a content structure that matches the requirements of your information. Figure 3 shows an XML file that contains a recipe.

<Recipe Cuisine = "Italian" Author = "Unknown">
  <Name>Marinar Sauce</Name>
  <IngredientList>
    <Ingredient>
      <Quantity>2 tbsp.</Quantity>
      <Item>olive oil</Item>
    </Ingredient>
    <Ingredient>
      <Quantity>2 cloves</Quantity>
      <Item>garlic</Item>
    </Ingredient>
    <Ingredient>
      <Quantity>1/2 tsp.</Quantity>
      <Item>hot red pepper </Item>
    </Ingredient>
    <Ingredient>
      <Quantity>28 oz.</Quantity>
      <Item>canned tomatoes, preferably San Marzano</Item>
    </Ingredient>
    <Ingredient>
      <Quantity>2 tbsp.</Quantity>
      <Item>parsley</Item>
      <Preparation>chopped</Preparation>
    </Ingredient>
  <IngredientList>
  <Instructions>
    <Para>Heat olive oil in a large saucepan on medium. Add garlic and hot red pepper and   sweat until fragrant. Add tomatoes, breaking up in to smaller pieces. Simmer on   medium-low heat for at least 20 minutes. Add parsley, simmer for another five minutes.   Serve over long pasta.
    </Para>
  </Instructions>
</Recipe>

Figure 3: A recipe in XML

XML is said to be well-formed when basic tagging rules are followed. For example:

<element>This element has content</element>

<empty_element />

<element attribute="name">This is a legal attribute</element>

<element attribute=name>This is not well-formed.</element>

<element>This is <strong>correct.</strong></element>

<element>This is<strong>not correct.</element></strong>

XML is said to be valid when the structure of the XML matches the structure specified in the structure definition. When the structure does not match, the XML file is invalid (Figure 4).

invalid structure

Figure 4: Invalid structure

Entities

An XML entity is a placeholder. Entities allow you to reuse information; for example, you could define an entity for a copyright statement:

<!ENTITY copyright "Copyright 2008 Scriptorium Publishing Services, Inc. All rights reserved.">

To reference the entity, you refer to the entity name:

&copyright;

The entity text is displayed instead of the entity name:

Copyright 2008 Scriptorium Publishing Services, Inc. All rights reserved.

Storing common information in entities lets you make a change in one location (the entity definition) and have the change show up everywhere that references the entity.

Entities are also used to include information that can’t be easily rendered as text. Graphics, for example, can be referenced as entities. In the following example, the entity definition contains the entity name, graphic file name, and file type:

<!ENTITY my_image SYSTEM "image.gif" NDATA gif>

In the XML file, a Graphic element references this entity:

<Graphic entity = "my_image" />

How are XML and structured authoring related?

Structured authoring is a concept. XML is a specification that lets you implement structured authoring using plain text files. In the past, most structured authoring implementations were based on SGML; today, XML is the standard. The terms XML and structured authoring are often used almost interchangeably.

Unlike SGML, XML is widely used outside the technical publishing world, especially for data interchange and web services applications.

Defining structure in XML

In XML, you define your structure using either a DTD or schema. In either case, you specify elements and how they are related to each other. For example, a Recipe element definition might read as follows in a DTD:

<!ELEMENT Recipe (Name, History?, IngredientList, Instructions)>

In an XML schema, the definition is itself an XML document. For the Recipe element, a simplified Recipe definition would read as follows:

<xsd:complexType name="Recipe">

  <xsd:sequence>

   <xsd:element name="Name" type="xsd:string"/>

   <xsd:element name="History" type="xsd:string" minOccurs="0"    maxOccurs="1"/>

   <xsd:element name="IngredientList" type="xsd:string"/>

   <xsd:element name="Instructions" type="xsd:string"/>

 </xsd:sequence>

<xsd:complexType>

Once you define the structure, authors create documents that comply with the structure. At a bare minimum, this allows you to specify, for instance, that the list of ingredients in a recipe must occur before the instructions.

Schema are especially useful in XML-based programming applications, where they allow you to validate and restrict data inside the structure. DTDs are more common in publishing applications, partly because of the legacy with SGML. In long, technical documents that consist mostly of paragraphs, the validation provided by schema does not add a significant amount of value.

 

Next page:
The impact of structured authoring on a publishing workflow


Scriptorium Publishing | Post Office Box 12761 Research Triangle Park, NC 27709 | (919) 481 2701 | info@scriptorium.com