When I originally built the hydrus web site, I used frames to present a static navigation bar, with another frame for the page content. This made it easy to define the page contents, without the overhead of repeating the standard images/navigation on each page. However, for visitors linking directly to a page, it made it difficult to view other sections of the site (and even made its identity obscure).
In order to remove frames, I needed some mechanism to support common content in each page - a template. As I envisaged that it would be necessary to change the page template to cater for new menu items, changes in style and so on, I had to develop an automated process to create the website from a template and specific page contents. In addition, I didn't want to be re-creating the whole website every time I made a small content change, so I needed something that would only make the minimal set of updates needed. This sounded like a job for make (or specifically GNU make, which is what I used).
After a period of experimentation, I built a system which relied on
three Makefiles, and a couple of python scripts. The master
Makefile requires two subordinate make files, compile.m
k and
publish.mk
. These are responsible for building the final pages, and
installing them in the public html directory, respectively.
To perform the file processing required to merge the page contents
into a template, the python script create-html.py
is used. Also,
during the processing, file contents need to be modified - this
capability is provided by munge.py
The system requires two directory structures under a root directory
containing the page templates, Makefiles, and python scripts. The
src
subdirectory contains the basic page content, images and other
support files (e.g. index file for the technical journal). The
obj
contains the results of combining the page contents
with the page template, and copies of the non-page files. At this
point, the web site files can be examined to ensure correctness
before the final stage of publishing. The process is invoked by
the following command, issued in the root directory:
make compile
The publishing process merely copies all newly modified files from
the obj
directory to the web site location:
make publish
The process was complicated by the presence of HTML files generated
from Docbook markup. The jade
process I used to convert
Docbook to HTML output HTML 4.01, although the rest of the web site
conforms to XHTML 1.0. I therefore needed a slightly different
template for the docbook HTML, as well as the ability to remove the
headers and footers from the original Docbook on the fly. This
latter requirement stemmed from the fact that the HTML files would
be re-generated should I have to revise the original Docbook XML
file.
The makefiles were modified to handle two new factors: the addition of a configuration control system (CVS) and the conversion to complete XHTML compliance (see Docbook and XHTML for details).
This made the processing considerably simpler. See Improved Web Publishing.
# Makefile to process and publish hydrus web contents # # # make compile - create pages by placing page contents into the standard # page template. Docbook pages need a different template # since they are in HTML 4.01. # # make publish - copy processed pages to web directory # # # MODIFICATION HISTORY # Mnemonic Date Rel Who # www-publish 040615 1.0 mpw # Written. # SRC := ./src OBJ := ./obj JOURNAL-SRC := ${SRC}/journal PUB-DIR := /usr/local/www/data TEMPLATE-NORMAL := page-template.html TEMPLATE-DOCBOOK := page-template-docbook.html .PHONY: compile publish clean template link compile: template link find ${SRC} -type d -exec gmake -C {} -f ${CURDIR}/compile.mk \ TEMPLATE-NORMAL=${CURDIR}/${TEMPLATE-NORMAL} \ TEMPLATE-DOCBOOK=${CURDIR}/${TEMPLATE-DOCBOOK} ROOT=${CURDIR} \; cp ${TEMPLATE-NORMAL} ${OBJ} chmod 644 ${OBJ}/${TEMPLATE-NORMAL} publish: find ${OBJ} -type d -exec gmake -C {} -f ${CURDIR}/publish.mk \ ROOT=${CURDIR} PUB-DIR=${PUB-DIR} \; # update internal links in journal if necessary link: cd ${JOURNAL-SRC}; \ newlink.py clean: rm -rf ${OBJ}/* # Update docbook template if standard template has changed template: ${TEMPLATE-DOCBOOK} ${TEMPLATE-DOCBOOK}: ${TEMPLATE-NORMAL} cp ${TEMPLATE-NORMAL} ${TEMPLATE-DOCBOOK} munge.py -f create-docbook-template ${TEMPLATE-DOCBOOK}
# Makefile for constructing publishable html file from source html # files and a template. The variable TEMPLATE-NORMAL and # TEMPLATE-DOCBOOK should be passed as an argument to the make # directive. One or other of them are used to create all the target # html pages, and therefore all target files are dependent on them. # # MODIFICATION HISTORY # Mnemonic Date Rel Who # www-publish 040615 1.0 mpw # Written. # # set target directory TD := ${subst src,obj,${CURDIR}} # define pattern rule for producing .html files in the target directory # Docbook HTML files may have a body in them, so we remove the <BODY> tags # and replace with a comment indicating this file is docbook html. # N.B. This relies on using docbook2html (i.e. jade) to produce the HTML # files; xmlto produces different signatures. # This comment is used to determine if the docbook template should be used. # If not, the normal template is applied. ${TD}/%.html : %.html grep "<BODY" $< >/dev/null ; \ if [ $$? -eq 0 ]; then \ munge.py -f ${ROOT}/remove-body $< ; \ fi grep "<!-- DOCBOOK -->" $< >/dev/null ; \ if [ $$? -eq 0 ]; then \ ${ROOT}/create-html.py ${TEMPLATE-DOCBOOK} $< $@; \ else \ ${ROOT}/create-html.py ${TEMPLATE-NORMAL} $< $@ ; \ fi # pattern rule to make non-html targets (images, support files, etc) # note we ignore directories ${TD}/% : % if [ ! -d $< ]; then \ cp $< $@; \ fi # define list of targets (based on list of .html files in current directory) OBJS := ${patsubst %,${TD}/%,${wildcard *.html}} # define list of non-html targets OTHER := ${patsubst %,${TD}/%,${filter-out %.html,${wildcard *}}} all: ${TD} ${OBJS} ${OTHER} ${OBJS}: ${TEMPLATE-NORMAL} ${TEMPLATE-DOCBOOK} # make target directory if necessary ${TD}: mkdir -p ${TD}
# Makefile for publishing html files from processed html pages # # MODIFICATION HISTORY # Mnemonic Date Rel Who # www-publish 040615 1.0 mpw # Written. # # set target directory # note, PUB-DIR and ROOT are passed on invocation line TD := ${subst ${ROOT}/obj,${PUB-DIR},${CURDIR}} # pattern rule to make all targets (directories are ignored) ${TD}/% : % if [ ! -d $< ]; then \ cp $< $@ ; \ fi # define list of targets (that's everything) OBJS := ${patsubst %,${TD}/%,${wildcard *}} all: ${TD} ${OBJS} # make target directory if necessary ${TD}: mkdir -p ${TD}
#!/usr/local/bin/python """ NAME create-html.py - wraps HTML page contents with HTML page template SYNOPSIS create-html.py template_file source_page_contents output_page DESCRIPTION create-html.py will insert the contents of an HTML page into a supplied page template, outputting the results as a final HTML page. The title of the resulting page is determined by the first <h1> for <h2> header encountered in the page contents. MODIFICATION HISTORY Mnemonic Rel Date Who create-html 1.0 040614 mpw Written. """ import sys import re default_title = "hydrus.org.uk" template_file = sys.argv[1] html_in_file = sys.argv[2] html_out_file = sys.argv[3] template = open(template_file).read() html_in = open(html_in_file).read() html_out = open(html_out_file,mode="w") # attempt to modify title to reflect page contents re_title = re.compile(r'<title>.*?</title>') re_header = re.compile(r'<h[12]>(.*)</h[12]>') match = re_header.search(html_in) if match != None: header = match.group(1) page_title = "<title>"+default_title+" - "+header+"</title>" else: page_title = "<title>"+default_title+"</title>" if re_title.search(template): template = re_title.sub(page_title,template) content = template.replace("<!-- page contents go here -->",html_in) html_out.write(content)
#!/usr/local/bin/python """ NAME munge.py SYNOPSIS python munge.py [-f cmd_file] [-n] file [...] DESCRIPTION Performs editing functions on files specified on command line. Munge differs from sed and awk, in that it allows (nay, insists on) multi-line substitutions. Munge accepts the following commands from stdin (or the cmd_file if the -f option is given): %prefix .text. %end Prefixes the contents of the file with the .text. specified between %prefix and %end. %append .text. %end Appends the contents of the file with the .text. specified between %append and %end. %sub .regexp. %new .text. %end Substitutes the .regexp. with .text. in the file. Note that any trailing newline character in the regexp and text is removed. By default, regexps will match the . metacharacter to everything (including newline). Specifying -n on the command line will suppress this default. Since the regexps are passed to python unchanged, it is possible to specify alternate matching instructions via the regexp string itself (see the python documentation on how to do this). MODIFICATION HISTORY Mnemonic Rel Date Who munge.py 1.0 20040607 mpw Created munge.py 1.1 20040609 mpw Added -n option """ import os import re import sys import getopt #### munge command class - used to hold editing commands and string arguments class mcmd: def __init__ (self,m,o,n,re_opts): self.cmd = m self.old = re.compile(o,re_opts) self.new = n self.next = None def set_next (self,n): self.next = n def execute (self,current): return apply(self.cmd,(current,self.old,self.new)) #### munge edit operations def mappend(current,old,new): return current+new def mprefix(current,old,new): return new+current def msub(current,old,new): if old.search(current): return old.sub(new,current) else: return current #### read lines from stream until block terminator def getlines(instream,term): buf = "" l = instream.readline() while l.find(term): buf = buf+l l = instream.readline() return buf #### process munge commands def get_commands(instream,re_opts): cmd = None new = "" old = "" head = None tail = None try: while True: l = instream.readline().rstrip("\n") if l == "%prefix": cmd = mprefix new = getlines(instream,"%end") elif l == "%append": cmd = mappend new = getlines(instream,"%end") elif l == "%sub": cmd = msub old = getlines(instream,"%new").rstrip("\n") new = getlines(instream,"%end").rstrip("\n") elif l == "": return head else: print "munge: unrecognised command - quitting" sys.exit(1) if tail != None: tail.set_next(mcmd(cmd,old,new,re_opts)) tail = tail.next else: tail = mcmd(cmd,old,new,re_opts) head = tail except: print "File read error" raise sys.exit(1) #+++++++++++++++++++++++++++++++++++++++++++++++++ # start of program #+++++++++++++++++++++++++++++++++++++++++++++++++ #default is to read munge commands from stdin cmdfile = sys.stdin # default for regular expressions is . matches everything, including newline re_opts = re.DOTALL # read command line arguments, if any try: opts,args = getopt.getopt(sys.argv[1:],'f:n') for o,v in opts: if o == '-f': cmdfile = open(v) elif o == '-n': re_opts = 0 except getopt.GetoptError: print "illegal argument" sys.exit(0) # read cmdfile for the munge commands; returns cmd chain head = get_commands(cmdfile,re_opts) # apply munge cmd chain to each file on command line for file in args: content = open(file).read() this = head while this != None: content = this.execute(content) this = this.next h = open(file,mode="w") h.write(content) h.close()
You may have noticed that programs in cgi-bin are not included in this process. This is a problem because these programs generate pages on the fly, and therefore need to use the current page template. The name of the template is currently wired into the source code, rather than being discovered from some environment setting or such. This needs to be fixed.
$Id: webpublish.html,v 1.3 2023/03/27 08:07:33 mark Exp $