When I first started this website, I chose to write the web pages in
XHTML (specifically -//W3C//DTD XHTML 1.0
Transitional//EN
). It seemed like the right thing to do at
the time. I wrote a Python program, using libxml2 capabilities, to
validate web pages before they were published to the site. This
worked well for many years.
Much later, when I wanted to take advantage of the new HTML5 Canvas
feature, I found I had to introduce a relaxed validation for these
pages, which were flagged by a suffix of .htmx
. Then, I
added a mandoc-generated man page for remind. This was
also HTML5. I began to think that the cool kids had left me behind.
In fact, it looked like the whole world had moved on. Perhaps I
shoud as well.
To convert to HTML5, first I needed something to perform page validation. I first looked at a Python module, html5lib. While this worked, it was very slow to validate a page (over two seconds on my server). The existing XHTML checking was sub-second. I then found tidy, which is more of a markup corrector. However, it also warned of markup errors and was of a similar speed to the XHTML validation. Even better, it came with a command-line program, so I didn't need any Python front-end.
That left the process of converting the site template to HTML5 and modifying page contents to remove deprecated or illegal constructs. These were mainly:
These corrections were mainly driven by the W3C Validator. It emitted warnings and errors ignored by tidy.
The increased importance of CSS also led me to clean-up the existing style sheets.
I did think about using html5lib
as a validator
separate from the publishing process, but it seems no more sensitive
to the errors identified by the W3C validator than
tidy
.