Web Publishing using make
When I originally built the hydrus web site, I used frames to present a static navigation bar, with another frame for the page content. This made it easy to define the page contents, without the overhead of repeating the standard images/navigation on each page. However, for visitors linking directly to a page, it made it difficult to view other sections of the site (and even made its identity obscure).
In order to remove frames, I needed some mechanism to support common content in each page - a template. As I envisaged that it would be necessary to change the page template to cater for new menu items, changes in style and so on, I had to develop an automated process to create the website from a template and specific page contents. In addition, I didn't want to be re-creating the whole website every time I made a small content change, so I needed something that would only make the minimal set of updates needed. This sounded like a job for make (or specifically GNU make, which is what I used).
After a period of experimentation, I built a system which relied on
three Makefiles, and a couple of python scripts. The master
Makefile requires two subordinate make files, compile.mk and
publish.mk. These are responsible for building the final pages, and
installing them in the public html directory, respectively.
To perform the file processing required to merge the page contents
into a template, the python script create-html.py is used. Also,
during the processing, file contents need to be modified - this
capability is provided by munge.py
The system requires two directory structures under a root directory
containing the page templates, Makefiles, and python scripts. The
src subdirectory contains the basic page content, images and other
support files (e.g. index file for the technical journal). The
obj contains the results of combining the page contents
with the page template, and copies of the non-page files. At this
point, the web site files can be examined to ensure correctness
before the final stage of publishing. The process is invoked by
the following command, issued in the root directory:
make compile
The publishing process merely copies all newly modified files from
the obj directory to the web site location:
make publish
The process was complicated by the presence of HTML files generated
from Docbook markup. The jade process I used to convert
Docbook to HTML output HTML 4.01, although the rest of the web site
conforms to XHTML 1.0. I therefore needed a slightly different
template for the docbook HTML, as well as the ability to remove the
headers and footers from the original Docbook on the fly. This
latter requirement stemmed from the fact that the HTML files would
be re-generated should I have to revise the original Docbook XML
file.
Addendum - 2nd October, 2005
The makefiles were modified to handle two new factors: the addition of a configuration control system (CVS) and the conversion to complete XHTML compliance (see Docbook and XHTML for details).
This made the processing considerably simpler. See Improved Web Publishing.
Makefile
# Makefile to process and publish hydrus web contents
#
#
# make compile - create pages by placing page contents into the standard
# page template. Docbook pages need a different template
# since they are in HTML 4.01.
#
# make publish - copy processed pages to web directory
#
#
# MODIFICATION HISTORY
# Mnemonic Date Rel Who
# www-publish 040615 1.0 mpw
# Written.
#
SRC := ./src
OBJ := ./obj
JOURNAL-SRC := ${SRC}/journal
PUB-DIR := /usr/local/www/data
TEMPLATE-NORMAL := page-template.html
TEMPLATE-DOCBOOK := page-template-docbook.html
.PHONY: compile publish clean template link
compile: template link
find ${SRC} -type d -exec gmake -C {} -f ${CURDIR}/compile.mk \
TEMPLATE-NORMAL=${CURDIR}/${TEMPLATE-NORMAL} \
TEMPLATE-DOCBOOK=${CURDIR}/${TEMPLATE-DOCBOOK} ROOT=${CURDIR} \;
cp ${TEMPLATE-NORMAL} ${OBJ}
chmod 644 ${OBJ}/${TEMPLATE-NORMAL}
publish:
find ${OBJ} -type d -exec gmake -C {} -f ${CURDIR}/publish.mk \
ROOT=${CURDIR} PUB-DIR=${PUB-DIR} \;
# update internal links in journal if necessary
link:
cd ${JOURNAL-SRC}; \
newlink.py
clean:
rm -rf ${OBJ}/*
# Update docbook template if standard template has changed
template: ${TEMPLATE-DOCBOOK}
${TEMPLATE-DOCBOOK}: ${TEMPLATE-NORMAL}
cp ${TEMPLATE-NORMAL} ${TEMPLATE-DOCBOOK}
munge.py -f create-docbook-template ${TEMPLATE-DOCBOOK}
compile.mk
# Makefile for constructing publishable html file from source html
# files and a template. The variable TEMPLATE-NORMAL and
# TEMPLATE-DOCBOOK should be passed as an argument to the make
# directive. One or other of them are used to create all the target
# html pages, and therefore all target files are dependent on them.
#
# MODIFICATION HISTORY
# Mnemonic Date Rel Who
# www-publish 040615 1.0 mpw
# Written.
#
# set target directory
TD := ${subst src,obj,${CURDIR}}
# define pattern rule for producing .html files in the target directory
# Docbook HTML files may have a body in them, so we remove the <BODY> tags
# and replace with a comment indicating this file is docbook html.
# N.B. This relies on using docbook2html (i.e. jade) to produce the HTML
# files; xmlto produces different signatures.
# This comment is used to determine if the docbook template should be used.
# If not, the normal template is applied.
${TD}/%.html : %.html
grep "<BODY" $< >/dev/null ; \
if [ $$? -eq 0 ]; then \
munge.py -f ${ROOT}/remove-body $< ; \
fi
grep "<!-- DOCBOOK -->" $< >/dev/null ; \
if [ $$? -eq 0 ]; then \
${ROOT}/create-html.py ${TEMPLATE-DOCBOOK} $< $@; \
else \
${ROOT}/create-html.py ${TEMPLATE-NORMAL} $< $@ ; \
fi
# pattern rule to make non-html targets (images, support files, etc)
# note we ignore directories
${TD}/% : %
if [ ! -d $< ]; then \
cp $< $@; \
fi
# define list of targets (based on list of .html files in current directory)
OBJS := ${patsubst %,${TD}/%,${wildcard *.html}}
# define list of non-html targets
OTHER := ${patsubst %,${TD}/%,${filter-out %.html,${wildcard *}}}
all: ${TD} ${OBJS} ${OTHER}
${OBJS}: ${TEMPLATE-NORMAL} ${TEMPLATE-DOCBOOK}
# make target directory if necessary
${TD}:
mkdir -p ${TD}
publish.mk
# Makefile for publishing html files from processed html pages
#
# MODIFICATION HISTORY
# Mnemonic Date Rel Who
# www-publish 040615 1.0 mpw
# Written.
#
# set target directory
# note, PUB-DIR and ROOT are passed on invocation line
TD := ${subst ${ROOT}/obj,${PUB-DIR},${CURDIR}}
# pattern rule to make all targets (directories are ignored)
${TD}/% : %
if [ ! -d $< ]; then \
cp $< $@ ; \
fi
# define list of targets (that's everything)
OBJS := ${patsubst %,${TD}/%,${wildcard *}}
all: ${TD} ${OBJS}
# make target directory if necessary
${TD}:
mkdir -p ${TD}
create-html.py
#!/usr/local/bin/python
"""
NAME
create-html.py - wraps HTML page contents with HTML page template
SYNOPSIS
create-html.py template_file source_page_contents output_page
DESCRIPTION
create-html.py will insert the contents of an HTML page into a supplied
page template, outputting the results as a final HTML page.
The title of the resulting page is determined by the first <h1>
for <h2> header encountered in the page contents.
MODIFICATION HISTORY
Mnemonic Rel Date Who
create-html 1.0 040614 mpw
Written.
"""
import sys
import re
default_title = "hydrus.org.uk"
template_file = sys.argv[1]
html_in_file = sys.argv[2]
html_out_file = sys.argv[3]
template = open(template_file).read()
html_in = open(html_in_file).read()
html_out = open(html_out_file,mode="w")
# attempt to modify title to reflect page contents
re_title = re.compile(r'<title>.*?</title>')
re_header = re.compile(r'<h[12]>(.*)</h[12]>')
match = re_header.search(html_in)
if match != None:
header = match.group(1)
page_title = "<title>"+default_title+" - "+header+"</title>"
else:
page_title = "<title>"+default_title+"</title>"
if re_title.search(template):
template = re_title.sub(page_title,template)
content = template.replace("<!-- page contents go here -->",html_in)
html_out.write(content)
munge.py
#!/usr/local/bin/python
"""
NAME
munge.py
SYNOPSIS
python munge.py [-f cmd_file] [-n] file [...]
DESCRIPTION
Performs editing functions on files specified on command line.
Munge differs from sed and awk, in that it allows (nay, insists on)
multi-line substitutions. Munge accepts the following commands
from stdin (or the cmd_file if the -f option is given):
%prefix
.text.
%end
Prefixes the contents of the file with the .text. specified between
%prefix and %end.
%append
.text.
%end
Appends the contents of the file with the .text. specified between
%append and %end.
%sub
.regexp.
%new
.text.
%end
Substitutes the .regexp. with .text. in the file. Note that any
trailing newline character in the regexp and text is removed.
By default, regexps will match the . metacharacter to everything
(including newline). Specifying -n on the command line will
suppress this default. Since the regexps are passed to python
unchanged, it is possible to specify alternate matching
instructions via the regexp string itself (see the python
documentation on how to do this).
MODIFICATION HISTORY
Mnemonic Rel Date Who
munge.py 1.0 20040607 mpw
Created
munge.py 1.1 20040609 mpw
Added -n option
"""
import os
import re
import sys
import getopt
#### munge command class - used to hold editing commands and string arguments
class mcmd:
def __init__ (self,m,o,n,re_opts):
self.cmd = m
self.old = re.compile(o,re_opts)
self.new = n
self.next = None
def set_next (self,n):
self.next = n
def execute (self,current):
return apply(self.cmd,(current,self.old,self.new))
#### munge edit operations
def mappend(current,old,new):
return current+new
def mprefix(current,old,new):
return new+current
def msub(current,old,new):
if old.search(current):
return old.sub(new,current)
else:
return current
#### read lines from stream until block terminator
def getlines(instream,term):
buf = ""
l = instream.readline()
while l.find(term):
buf = buf+l
l = instream.readline()
return buf
#### process munge commands
def get_commands(instream,re_opts):
cmd = None
new = ""
old = ""
head = None
tail = None
try:
while True:
l = instream.readline().rstrip("\n")
if l == "%prefix":
cmd = mprefix
new = getlines(instream,"%end")
elif l == "%append":
cmd = mappend
new = getlines(instream,"%end")
elif l == "%sub":
cmd = msub
old = getlines(instream,"%new").rstrip("\n")
new = getlines(instream,"%end").rstrip("\n")
elif l == "":
return head
else:
print "munge: unrecognised command - quitting"
sys.exit(1)
if tail != None:
tail.set_next(mcmd(cmd,old,new,re_opts))
tail = tail.next
else:
tail = mcmd(cmd,old,new,re_opts)
head = tail
except:
print "File read error"
raise
sys.exit(1)
#+++++++++++++++++++++++++++++++++++++++++++++++++
# start of program
#+++++++++++++++++++++++++++++++++++++++++++++++++
#default is to read munge commands from stdin
cmdfile = sys.stdin
# default for regular expressions is . matches everything, including newline
re_opts = re.DOTALL
# read command line arguments, if any
try:
opts,args = getopt.getopt(sys.argv[1:],'f:n')
for o,v in opts:
if o == '-f': cmdfile = open(v)
elif o == '-n': re_opts = 0
except getopt.GetoptError:
print "illegal argument"
sys.exit(0)
# read cmdfile for the munge commands; returns cmd chain
head = get_commands(cmdfile,re_opts)
# apply munge cmd chain to each file on command line
for file in args:
content = open(file).read()
this = head
while this != None:
content = this.execute(content)
this = this.next
h = open(file,mode="w")
h.write(content)
h.close()
Defects
You may have noticed that programs in cgi-bin are not included in this process. This is a problem because these programs generate pages on the fly, and therefore need to use the current page template. The name of the template is currently wired into the source code, rather than being discovered from some environment setting or such. This needs to be fixed.
$Id: webpublish.html,v 1.3 2023/03/27 08:07:33 mark Exp $