Migrating from XHTML to HTML5

When I first started this website, I chose to write the web pages in XHTML (specifically -//W3C//DTD XHTML 1.0 Transitional//EN). It seemed like the right thing to do at the time. I wrote a Python program, using libxml2 capabilities, to validate web pages before they were published to the site. This worked well for many years.

Much later, when I wanted to take advantage of the new HTML5 Canvas feature, I found I had to introduce a relaxed validation for these pages, which were flagged by a suffix of .htmx. Then, I added a mandoc-generated man page for remind. This was also HTML5. I began to think that the cool kids had left me behind. In fact, it looked like the whole world had moved on. Perhaps I shoud as well.

To convert to HTML5, first I needed something to perform page validation. I first looked at a Python module, html5lib. While this worked, it was very slow to validate a page (over two seconds on my server). The existing XHTML checking was sub-second. I then found tidy, which is more of a markup corrector. However, it also warned of markup errors and was of a similar speed to the XHTML validation. Even better, it came with a command-line program, so I didn't need any Python front-end.

That left the process of converting the site template to HTML5 and modifying page contents to remove deprecated or illegal constructs. These were mainly:

Image and paragraph alignments (align and valign), which should be now performed in CSS.
Table formatting (e.g. cellspacing) and border control. Likewise, now performed in CSS.

These corrections were mainly driven by the W3C Validator. It emitted warnings and errors ignored by tidy.

The increased importance of CSS also led me to clean-up the existing style sheets.

I did think about using html5lib as a validator separate from the publishing process, but it seems no more sensitive to the errors identified by the W3C validator than tidy.