XML for Documentation

I wanted to write some reasonable documentation for the bt (B tree) library. I figured I could write in HTML, making publishing on the web easy, or I could write it in man page style, making man pages easy to generate. But then I thought, isn't XML supposed to solve these kinds of problems; write it in XML and transform to HTML, or man, or anything else in the future...

What I needed was a predefined schema or DTD for writing documentation. A little searching turned up something called DocBook - an XML schema designed for just this purpose. The links to the various sites related to docbook are confusing, and some are just broken. In particular, any reference that directs you to http://www.oasis-open.org/docbook just ends up at the Technical Committee page, with no useful information at all. Here's the latest DTD for DocBook. You can get hold of the DocBook Guide, by Norman Walsh.

I downloaded the DTD for version 4.2 as a zip file. Extremely useful instructions for using DocBook in conjunction with the psgml Emacs modes used to be found at the Red Hat Web Application Framework and Red Hat Enterprise CMS site. Unfortunately, this site is now defunct. Perhaps try the Open ACS site instead.

The key difficulty I found in getting psgml working in conjuction with the downloaded DocBook DTD was that the samples shown of DocBook XML documents had out-of-date <!DOCTYPE ... > tags, so the samples did not work with the latest downloaded DTD.

I had followed all the instructions from the Red Hat site, but psgml kept complaining that it couldn't find the DTD.

I reviewed what I had done. First, I had created a dtd directory under my home directory, then a dtd/docbook-xml subdirectory, and extracted all downloaded DocBook files into the dtd/docbook-xml subdirectory.

    mkdir -p dtd/docbook-xml
    cd dtd/docbook-xml
    unzip ~/import/docbook-xml-4.2.zip

Then I had created a file called CATALOG in dtd directory with the following contents:

CATALOG "docbook-xml/docbook.cat"

The CATALOG file acts as a link to all the other catalogs you might have downloaded (only the Docbook catalog in my case). I had set the psgml variable which listed the catalog files to:

(add-to-list 'sgml-catalog-files "~/dtd/CATALOG")

Everything seemed to be as required. What was going wrong?

The DOCTYPE tag contains the following elements:

<!DOCTYPE book PUBLIC "public name" "system name">

and here's the one I was using as a test:

<!DOCTYPE book PUBLIC "-//AD//DTD DocBook XML V4.1.2-Based Extension//EN" "docbookx.dtd">

psgml first attempts to locate the public definition via any catalogs you have defined, then if that fails, it tries to locate the system definition. First, I discovered that if I changed the system name to the full relative pathname of docbookx.dtd (i.e. "dtd/docbook-sml/docbookx.dtd", psgml found the DTD. Hmm, but isn't that what the catalog is supposed to eliminate? Well, no as it turned out. On looking at the contents of the supplied docbook.cat file (as reference by the CATALOG file), I found that the public name was not the same as my test example. It was:

"-//OASIS//DTD DocBook XML V4.2//EN"

Ah ha! By changing the public name to that actually held on my system, psgml found the DTD by the public name, and did not need to resort to the system name. By the way, if you think once you've got a correct public name, you can leave the system name out, think again - the system name is mandatory.

To complement Docbook, there is a set of tools to convert the Docbook XML into a variety of formats. You can download them from Red Hat as set of RPMs.

Now I have a XML schema to write my documentation. What can stop me now?

Quite a bit, actually. I seem unable to use the XML DocBook definition with the DocBook tools. Openjade complains that "X00E" is not a function name, for every non-ASCII character defined in the entity files. I think I've followed all the guidance in terms of ensuring the SGML XML declarations are listed on the Openjade command line before the docbook source, but I still get the same problem.

Sometime later

OK, found the problem (but it took a while). Before I found out the cause, I had replaced openjade 1.3.1 by openjade 1.3.2, deleted the docbook 4.2 dtds, got hold of the latest stylesheets (1.78 instead of 1.76), and generally confused the situation even more. Finally, I decided to remove as much as I had installed as possible and start from scratch. That didn't solve the problem either.

It turns out that the dsssl stylesheets need SGML entity definitions, and XML documents need the XML definitions. The XML definitions are in the form

<!ENTITY cularr	"&#x21B6;"> <!-- ANTICLOCKWISE TOP -- SEMICIRCLE ARROW -->

whereas the SGML definitions look like

<!ENTITY cularr SDATA "[cularr]"--/curvearrowleft A: left curved arrow -->

To enable openjade to pick up the SGML entity definitions before it processed the stylesheet, I created a local catalog file like this:

CATALOG "/usr/share/sgml/sgml-iso-entities-8879.1986/catalog"
CATALOG "/usr/share/sgml/openjade-1.3.1/dsssl/catalog"
CATALOG "/usr/share/sgml/docbook-dsssl-1.78/catalog"
CATALOG "/usr/share/sgml/docbook/sgml-dtd-4.1-1.0-8/catalog"
CATALOG "/usr/share/sgml/docbook/xml-dtd-4.1.2-1.0-8/catalog"

Since I have modified the location of the catalog file (to my home directory), I changed my .emacs file to add the new catalog file location to the sgml-catalog-files variable.

openjade is invoked to produce the HTML output, like so:

export SGML_CATALOG_FILES=/home/mark/catalog
openjade -t xml -i html -d /usr/share/sgml/docbook/html/ldp.dsl\#html /usr/local/share/OpenSP/xml.dcl ~/bt.xml

Now I can use Emacs to write the docbook style documentation, and openjade to produce HTML format. It also works for rtf, but I haven't tested it for nroff (i.e. man pages) yet. I suspect I may have to re-install the docbook-utils for that to work.