Skip to main content
August 8, 2022

Demystifying content modeling

Content modeling may be the least understood part of structured content—which is saying something. Content modeling is the process of mapping your information’s implicit organization onto an explicit definition.

For example, consider an address. In the United States, your address includes a city, state, and zip code. Furthermore, we know that the basic zip code is five digits and that there are 50 specific states that you can code into your content model. But immediately, we run into problems. What about Puerto Rico or the District of Columbia? American Samoa? Will you support ZIP+4? And that’s just a US-only scenario!

When we look at narrative content, like a white paper or a task, we run into similar issues. Most of our content isn’t as highly constrained as a ZIP code field, but we still want to figure out whether and how we can encode content requirements into the content model.

In an unstructured document, the document formatting tells us the meaning of a particular piece of content. The process of content modeling lets us take those formatting cues and translate them into semantic tags, like warning, procedure, or byline.

In most cases, the content modeling effort is fairly limited. You don’t have to reinvent the wheel on addresses—many people have already done that work. Similarly, for technical content, you are almost certainly going to start with an existing content model, either a standard like DITA or DocBook, or the default standard provided by your content management system.

Assuming that you have a target content model, your effort looks like this:

  1. Identify and map all of the “common” components, like paragraphs, list items, warnings, and so on.
  2. Identify the “outlier” components. Outliers we’ve seen include unique warning labels, like topple warnings or radiation warnings; commentary or analysis tags that interrupt a narrative; and highly structured tables that need to be tagged semantically (not just as table/row/cell). Outliers are often the most valuable part of your content, so you need to figure out how to add them to your target content model.
  3. Identify any additional components that you want to add that are not present in the legacy content, and modify the content model to support them.
  4. Develop labels to help you classify and manage your content; for example, to restrict some information to internal users.
  5. Consider how you want to manage content variants and modify the content model as needed. For example, you might provide just the basics for beginners and more details for system administrators.
  6. Consider how you want to manage reuse; modify the content model as needed.

Even if you intend to use a standard like DITA “as is” (without customization), it’s important to work through these steps to ensure that all of your content requirements are covered and that your content is mapped consistently.

Contact us if you need help with your content modeling.