Tips for developing a taxonomy in DITA
When you’re coming up with a metadata strategy for your content, you should start by developing a taxonomy, or a hierarchy used to organize metadata. A taxonomy will help shape your metadata strategy and make implementation of that strategy possible. In this follow-up post to Making metadata in DITA work for you, you’ll learn some tips for creating a taxonomy that will succeed in helping your audience—both internal and external—find what they need.
A couple of disclaimers:
- This is a simple overview of taxonomies, intended for those with little or no experience in developing them.
- Some industries (such as medical or pharmaceutical) use pre-defined terminology, so their taxonomies will need to match those terms rather than reinvent them.
Categorizing your content
The first and most important step in developing a taxonomy is deciding how your content should be organized for filtering and search. Start by making a list of categories (or subsets) that your audience might need, such as all of the documentation for one product or all of the data sheets for products sold in a certain location. Keep your internal audience in mind, as well—your tech pubs team will probably need to find content according to author, review status, or release version.
Once you have your list of possible categories, evaluate them to determine which ones should be included in your taxonomy. Maybe a large portion of your audience needs to be able to search your content by product, but only a small percentage needs to search by region. You may also find audience demand for a category you didn’t include in your list. Audience metrics and feedback are crucial in helping you make your final decisions—if you’re not already capturing this information, now is a good time to start.
The categories you choose will inform the metadata attributes (or, in some cases, elements) that your audience will use to track your content. Your list might look something like this:
- author
- audience
- document title
- document type
- product
- region
- review status
Adding value to your categories
Now that you have a list of categories to track, your next step is to add value to them by listing all the options from which your audience will choose for each category. Some of these options will be pre-determined—for example, the options for the region category will be the names of the locations where your company sells its products. However, coming up with options for other categories will require more thought. Does it make the most sense to track the audience category according to your audience’s experience level (beginner, intermediate, advanced), occupation (writer, developer, engineer, manager), or some other factor? Again, any information you can gather from your audience will be useful.
Another consideration is whether your audience will have to choose from a strict, pre-set list of options or be able to add their own. For example, do you plan to have your technical writers choose their names from an existing list for the author category, or will you allow them to enter their names? It may be more convenient for new writers if they can add their names to the list of options, but if security and consistency are more important to your company (and if you want to prevent people from making careless typos), you may choose to have a strictly controlled list of authors, instead.
As you fill out your list of categories with options, you may realize that you need sub-categories in some cases. For example, your audience may need to track the product category in multiple ways, such as by product name, product ID number, and product group. Each sub-category of product would have its own list of options—for example, your audience could choose between software and hardware for product group.
Just as the categories in your list will become metadata attributes, the options for each category will become those attributes’ values. Your list should now look something like this:
- author
- (all author names)
- audience
- beginner
- intermediate
- advanced
- document title
- (all document titles)
- document type
- user guide
- data sheet
- product
- name
- (all product names)
- number
- (all product ID numbers)
- group
- software
- hardware
- name
- region
- Asia
- Europe
- North America
- review status
- edited
- approved
- completed
To specialize or not to specialize?
Once you have decided on a taxonomy, you should use it to help you decide whether or not your metadata strategy should involve specialization. How many of the attributes and values you’ve listed in your taxonomy are available in standard DITA? How necessary are the ones that aren’t available? How much effort will be required for specialization, and is it worth the cost? The more solid your taxonomy is, the better equipped you’ll be to answer these questions and come up with a business case for specialization if you need it.
It’s also important to remember that, while you are in the pre-implementation phase of your strategy, you can still adjust your taxonomy (it can and should evolve, but this gets more expensive post-implementation.) If your content is complex and you will need a multi-faceted metadata hierarchy for filtering it, you can refine your taxonomy to reflect this. On the other hand, if you don’t have enough of a business case for specialization, maybe you can eliminate the need to specialize by removing or changing just one attribute in your taxonomy.
Subject scheme
If you need custom values that can be easily changed or updated, you might consider creating a subject scheme from your taxonomy. With a subject scheme, you use key definitions in a map to define a collection of controlled values and their hierarchy or relationships. This gives you added flexibility to use one of a number of subject schemes—and also allows you to create custom values without needing to develop a specialization or modify a DTD.
A subject scheme might be particularly useful if your company offers a large variety of products, and you need a better way to organize your content for search. The controlled values that you define in a subject scheme map can be used to create facets, which your audience can use to filter your content during search. With faceted search, you can offer your audience a more sophisticated way of finding content than a table of contents, an index, or full text search can provide—and increase their chances of accessing the content they need.