Skip to main content
January 8, 2013

Perplexed by complex syntax: understanding syntax diagrams in DITA

What DITA elements are available for syntax diagrams? And how does one go about using them?

The other day, a client asked for help with formatting syntax diagrams in DITA sources, using the standard metacharacters: brackets, braces, and bars (often called “Unix” syntax diagrams). Although I’ve known about the DITA reference topic specialization and the refsyn and syntaxdiagram elements, I’ve never had a great call to use them.

Digging in the DITA specification and asking Mr. Google didn’t turn up much…other than the fact that that the refsyn and syntaxdiagram elements exist and Deborah Pickett has a plugin that will produce beautiful railway diagrams (once you get your syntax coded…but how do you encode the syntax?).

What I was looking for, but couldn’t find, was something that shows source and output of a few simple syntax diagrams: “To get this, do this.” The DITA specification comes close; it shows an example of the syntaxdiagram element, but fails to provide a corresponding example of what the output might look like (nor does the example descend to the attribute level, which is important).

After digging into a number of different sources, I was able to put together a reasonable example (which I’ll share here). But along the way, I discovered something somewhat disconcerting: I wouldn’t want to use any of it.

A test case

Let’s look at a simple command from a fictitious command line interpreter. Call it “doit.”

The doit command has two required parameters (represented by the variables in and out) and two optional command line switches (/a and /b). The /b switch requires one of three keywords: low, medium, or high. In a “Unix” syntax diagram, the doit command might be represented like this:

doit in out [/a] [/b {low | medium | high }]

There are essentially three ways to encode this syntax in DITA, which range from inappropriate use of tags (codeblock, currently used by the client) up through properly tagged content (syntaxdiagram, which turns out to be impractical).

Using codeblock

It is possible to contain syntax diagrams in the codeblock element, but this borders on tag abuse (using elements for the desired appearance, rather than for their semantic meaning). However, this is the most readable of the three approaches. To use syntax metacharacters (braces, brackets, and bars), you’ll need to hard code them in the text. (I added line breaks for readability.)

<codeblock>
doit <varname>in</varname> <varname>out</varname>
[/a] [/b {low | medium | high }]
</codeblock>

For further precision, you can use some inline elements to identify the contents of your syntax diagram:

  • cmdname – The name of the command
  • keyword – A keyword
  • varname – A variable


<codeblock>
<cmdname>doit</cmdname> <varname>in</varname> <varname>out</varname>
[/<keyword>a</keyword>]
[/<keyword>b</keyword>
{<keyword>low</keyword> | <keyword>medium</keyword> | <keyword>high</keyword> }]
</codeblock>

Using synph

The synph element is a step up from the codeblock. First off, it’s much more semantically appropriate. Secondly, because it is more correct semantically, you can modify the transforms in the DITA OT to give your syntax diagrams a different appearance from code examples. However, the synph element is (supposedly) an inline element; syntax examples usually begin on a new line. Using an inline element and expecting block-like behavior is questionable application. But again, this form is fairly readable. As with codeblock, you’ll have to hard-code syntax metacharacters.

The synph element can contain inline elements that identify the keywords, variables, and other aspects of the syntax. These are different from the elements used in a codeblock element. The DITA elements for semantic markup of synph contents include:

  • kwd – Keywords (by default, formatted in bold in OT HTML transforms, and no style applied in OT PDF transform)
  • var – Variables (formatted with italics)
  • delim – Delimiters, such as quotations marks, slashes, or hyphens
  • sep – Separators, such as required commas separating list items

There are other elements; the ones in the preceding list are important to our discussion. See the DITA specification for full details. The delim and sep elements do not provide any special formatting behavior, although the plugins can be extended to handle them differently.


<synph>doit <var>in</var> <var>out</var>
[/<kwd>a</kwd>]
[/<kwd>b</kwd> {<kwd>low</kwd> | <kwd>medium</kwd> | <kwd>high</kwd> }]
</synph>

Using syntaxdiagram

The syntaxdiagram element is the proper block-mode element to use for syntax diagrams. The elements used within syntax diagram allow the DITA Open Toolkit to format the syntax diagram using a variety of representations. The default is to use the syntax metacharacters, but there is a plugin that creates railway diagrams (http://tech.dir.groups.yahoo.com/group/dita-users/message/14504).

The three most useful elements with the syntaxdiagram element are the elements groupchoice, groupseq, and groupcomp. The other elements allowed in syntaxdiagram are for depicting the diagram in sub-pieces or for attaching notes. Here’s when to use the three group elements:

  • groupseq – Contains a sequence of elements that must occur in the order shown.
  • groupchoice – Contains a number of elements from which you can make a choice. The importance attribute indicates whether the contents are optional or required.
  • groupcomp – Contains a sequence of elements that must be formatted close together.

You can nest any of these three group elements inside other group elements.

The same inline elements used in synph can be used in the syntaxdiagram group elements, so each piece of the syntax can be identified precisely.


<syntaxdiagram>
<groupseq>
<kwd>doit</kwd>
<var>in</var>
<var>out</var>
<groupchoice importance="optional">
<groupseq>
<sep>/</sep>
<kwd>a</kwd>
</groupseq>
</groupchoice>
<groupchoice importance="optional">
<groupseq>
<sep>/</sep>
<kwd>b</kwd>
<groupchoice importance="required">
<kwd>low</kwd>
<kwd>medium</kwd>
<kwd>high</kwd>
</groupchoice>
</groupseq>
</groupchoice>
</groupseq>
</syntaxdiagram>

However, by this point, most of you are saying, “Ugh!”

And that’s my reaction, too. In fact there are several shortcomings to the DITA implementation of syntax diagrams.

  • The contents of the syntaxdiagram element are opaque. Unless you’re well versed in the syntaxdiagram elements, the meaning of the content is inscrutable. It would be great if there was a visual editor for this type of content (as there is for MathML-encoded mathematical equations).
  • The contents of synph and syntaxdiagram are entirely different. I like to think of the codeblock and codeph elements as two halves of the same whole; I use codeblock for a full example and codeph as inline text when quoting snippets from the code example. The elements used in codeph are a reasonable subset of the elements used in codeblock. However, when I use syntaxdiagram for the full syntax, the Open Toolkit handles the syntax metacharacters; when I use codeph, I have to provide them myself.
  • I’m surprised that synph and syntaxdiagram don’t provide a separate element for the command name. I don’t like the idea of using the kwd element (because I see the command name as being separate from other parts of the command). What’s more, it might be useful to do a faceted search on individual commands, which a separate element would facilitate.
  • The DITA Open Toolkit does not correctly implement the output for the group elements in a syntaxdiagram. When groupchoice is used with importance=”optional”, the Open Toolkit surrounds the content in both braces and brackets. I have never encountered this in practice, so I’m presuming it’s an error. I was able to override the behavior in my plugins.
    Similarly, there is no special formatting applied for the kwd or keyword element. Second-guessing the Open Toolkit group, this might be a deliberate choice, as these may be applied simply to identify content, rather than indicate formatting. Again, I overrode this behavior in my plugins. (Note that I only tested what I needed to implement; I can’t vouch for the completeness of the DITA OT implementation of the other elements allowed in syntaxdiagram.)

So what did we do?

Of the approaches, using synph seems to make the most sense (and that’s what our client chose to do). It keeps syntax diagrams readable and easily editable, but it also distinguishes them semantically from code examples.

I would use syntaxdiagram only if required by a client. A visual editor for syntaxdiagram would help a great deal. If developers’ sources for the command language were available, I would also search for (or build) a tool to convert the sources directly into syntaxdiagram elements. But again, only if required.

Do you have to include syntax diagrams in your DITA topics? What has been your experience?