Docbook and XHTML

Previously, I'd wrestled with Docbook markup for documentation. The toolchain used produced HTML 4.01, a different standard to the rest of the website, based on XHTML. As part of a drive to improve the consistency of the web site, which involved writing a dead link checker and an XHTML checker in python, I found that these HTML documentation pages were a pain.

This was for two reasons. Firstly, I'd had to complicate the generation of pages for the web site due to the two different standards used, and secondly, the python XML parser I was using to validate pages, xml.sax, could not handle HTML, since it is not well-formed XML. The HTML parser offered by python would only identify very gross errors in the HTML code, which made it fairly useless for checking syntax. I could, of course, used the callback offered by the HtmlParser object to write the validation myself, but this would take more time than I was prepared to spend.

So, was there a XSL stylesheet to transform Docbook to XHTML? Of course there was; the Docbook project on Sourceforge provides a slew of tools to handle Docbook format. I already had a copy of xsltproc on my Debian box to perform the translation.

As there are a couple of parameters I need to set, I created a simple stylesheet customisation layer. The first parameter setting causes function synopsis elements to be rendered in ANSI C format, rather than the default K&R. The second includes the standard hydrus CSS stylesheet in the generated XHTML pages. The local layer is shown below.

  <?xml version='1.0'?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:import href="/home/mark/doc/docbook-xsl-1.69.1/xhtml/chunk.xsl"/>
  <xsl:param name="funcsynopsis.style" select="ansi"/>
  <xsl:param name="html.stylesheet" select="'/styles/style.css'"/>
  </xsl:stylesheet>

There was one problem; the XHTML output chapter which described the B Tree test harness, bt, was not valid XHTML. This was caused by my use of the cmdsynopsis tag within the term tag of a variablelist element. The Docbook to XHTML stylesheet converted a variablelist into the following XHTML tags:

  <dl>
  <dt>term entry</dt>
  <dd>description</dd>
  </dl>

The stylesheet placed <div> and <p> elements around the body of the command synopsis. XHTML, however, does not permit these tags within a <dt> element.

As a quick workaround, I modified the synop.xsl to eliminate the generation of <div> and <p> elements around <cmdsynopsis>. To see if this was a defect in the XHTML stylesheets, I asked the question on the docbook-apps mailing list. The concensus was that it would always be possible generate illegal XHTML from Docbook, as Docbook has a less constrained content model than XHTML. The parameter variablelist.as.table was brought to my attention, which causes variablelists to be rendered as tables. By adding the line:

  <xsl:param name="variablelist.as.table" select="1"/>

to the local customisation layer, I ended up with a tabular presentation. Unfortunately, the visual effect was much less attractive, as the longer commands were made somewhat unreadable since they were folded to fit with the automatically generated column widths. I decided to explicitly rework the command list as a table, using spanspecs to produce the layout I desired. You can view the effect of the rework. The benefit of this change is that I no longer required the patch I'd applied to synop.xsl.

Now I was able to move on to writing the python scripts to perform automatic validation of the web site (see the next journal entry).