Here is a short description of some projects I worked on in the period 1989-1999. These are roughly in order of when I began them. Practically all of these were projects driven by my own needs, or those of my friends. In all but one or two cases, the programs, the approaches and algorithms they employed were all of my own construction. None of this work was ever published, though some of these things did appear in usenet newsgroups, principally in (the now defunct) comp.sources.unix.
In addition to these projects (which range in size from about 1000 lines to 15,000 lines), I have produced any number of smaller programs, mainly written in C, perl, tk/tcl and Emacs lisp. I have also produced several other large programs which have been published or are described elsewhere (principally Echo and AVS), and between 10 and 30 commercial web sites, depending on what you count.
A front-end for the vi editor. Kept a per-directory list of the most
recently visited files and provided fast access to these (by number,
by abbreviated pattern, by history). Editing the last file or second
last file visited in a directory was very fast. Provided simple
spelling correction (transpositions, omitted characters, extra
character, missing character) on file names. Detected and avoided
the vi modelines security problem. This program was written up in
the /bin column of the August 1990 edition of BYTE as one of the
author's favorite pieces of free software. The program was in
wide use around the world from 1989 onwards, after appearing in
comp.sources.unix.
A set of memory management algorithm simulation tools. Together with
Peter Buhr of the University of Waterloo, I implemented
approximately 6 memory management algorithms. I built a system to
run these and measure various aspects of their performance. The
system took files of memory size requests, delivered them to each
algorithm under controlled conditions, and calculated performance
results. The measurement system contained a timing package that
allowed for the timing of multiple simultaneous code sections. This
was done using the UNIX setitimer calls.
A thread-safe replacement for BSD malloc. When I discovered how
horribly inefficient Berkeley malloc was, I wrote my own malloc.
This pre-allocated a large piece of memory (size determined by an
initialization call), and special queues for frequently used
sizes. Initialization looked like
lmmm_init(64K, sizeof(struct1), sizeof(struct2), ...);
The reasoning being that many programs allocate many structures
whose size they know ahead of time. The allocation algorithm first
consulted these fixed-sized lists and failing a reasonable match
reverted to something that looked like BSD malloc (without the gross
repeated-sbrk inefficiencies and internal fragmentation caused by a
4K page size on our VAX). This was designed and written 1989, and is
still in use in Waterloo's uKernel.
A little language for the specification of graphs. This allowed the
simple production of a paper that required approximately 250
graphs. Provided for multiple edges, did layout of multiple graphs
in a fixed horizontal space and was generally useful. The program
produced pic output. 1989.
I wrote a package for the rapid construction of UNIX filters. I was
processing many multi-megabyte files of words and needed more speed
than awk. Perl barely existed. My package allowed options on the
command line that were used to write a C program that was compiled
and executed on the fly. The resulting executable could then be used
again of course. It also provided the C code as a basis for the
construction of more advanced filters. The package took advantage of
/dev/tty, tricking the C compiler into compiling standard input,
removing the need for an intermediate C file. Written in C in 1988.
A full Pascal compiler. This was written with a partner, and
completed in 13 weeks. We did it all, except for the final steps in
implementing sets. It produced VAX assembler and was more efficient
than the UNIX pascal compiler in terms of size of code produced,
time to compile, and run time of the resulting code (a requirement
to pass the course). Written in C in 1988.
A novel peephole optimizer for the above compiler. This was based on
Aho & Corasick's algorithm used in fgrep. The optimizer was provided
with a file of patterns indicating the kinds of inefficiencies the
compiler was known to produce. These took the form of matching rules
and replacement actions. The optimizer (as a result of the
algorithm) did just a single pass over the assembler code, and ran
very quickly. Written in C in 1988.
M, a mail/man/mkdir/more/make substitute. This was based on the
observation that which of these common commands was intended could
usually be determined by examining the command's arguments. M worked
very well, but was not widely adopted. Written in C in 1989.
A mail server that allowed a remote user to send UNIX commands in
the mail and receive the output back via mail. This was before the
days of telnet and ftp access to all machines. Security was minimal,
based on incoming address and a password contained in the mail. I
used the system extensively to access my account at Waterloo when I
was traveling. Written in C in 1988.
Strsed, a sed-like C function for strings. This was written for inclusion
in a TCL-like interpreted language, but evolved into a separate C
function that appeared in comp.sources.unix in 1990. Allowed substitute
and replace operations with full regular expressions (using either GNU or
Henry Spencer regexp code) and transliteration, a la tr. Understood a
full range of backslashed escape codes. Functionality was similar to what
perl provides with its substitution and transliteration operators. As of
2007, strsed is still used in some parts of Apple's Darwin Ports
collection. Written in C in 1990.
A company-wide telephone and address database. Built for a company
of 300 people, allowed immediate access to names, phone numbers,
addresses, etc., and also produced compact postscript output for the
printing of phone lists. This was used by PCS Gmbh in Munich. 1991.
A hashing package which I have used in many application since
written. This does simple hashing with collision resolution by
examining a linked list of items in a bucket (most recently accessed
first). Written in C in 1990.
A package for the processing of UNIX command-line options. Allows
long and short names, abbreviations, does conversion of arguments
and places converted results into variable address locations,
detects option ambiguities, allows mandatory options, does simple
range checking on numeric options, allows reading options from
files, and produces a usage function that shows documentation of all
known options. This has evolved over the years and is still, I
think, superior to the options processing in almost all UNIX
utilities I'm aware of. Written in 1990.
A dynamic trie-based escape sequence parser for a new version of
xterm. This allowed for the recognition of escape sequences sent to
a terminal emulator. New sets of sequences could be added or removed
without disturbing parsing. This was used in an xterm replacement
(emu) four or five of us worked on at PCS in Munich. This was in use
towards the end of 1990. I believe it is still used by a
communications program (seyon). Written in C.
A replacement for UNIX man command. I wrote a nicely designed
algorithm for more controlled access to UNIX man pages. This
presented the user with a list of matching pages and allowed rapid
selection. It allowed partial command names to be given (very useful
to see all pages starting with a string). Written in 1990 in C and
still in daily use.
A program to take a source tree and recursively check it in to
RCS. Added Header and Log fields to files and understood commenting
conventions. Figured out file types based on extensions and contents
and was generally VERY careful about its job. Used extensively to
import large source code distributions (including multiple copies of
X windows) by PCS in Munich. Written as an sh shell script in 1989.
An automated mailing list kit with news-to-mail and mail-to-news
gateways. This was used to administer the internet newsgroups
rec.juggling and rec.unicycling over the period 1991-93
(approximately). The mailing list software allowed for individual
message delivery, or digest form. The system also maintained a
newsgroup archive, which is the reason the entire archives of
rec.juggling are now available. Written in 1990 in C and sh shell
scripts.
A system for scheduling large numbers of processes across machines.
A controlling process was run periodically by cron to gather
outstanding jobs and allocate them to available machines. The
controlling process would also kill jobs on machines that had
interactive users, and schedule them for execution elsewhere. It had
a simple mechanism for distributing load according to client
processor power. The client process running on each of these
machines took jobs from their spool area until done. Simple commands
allowed a user to add jobs to the master process' spooling
area. Jobs could be split into multiple (identical) sub-jobs that
were run on different machines and their results would be collected
later for delivery to the user. This system was used by 2 PhD
researchers (including the author) to manage tens of thousands of
experiment runs on networks of up to 30 workstations. Written in
1991, used in 1991, 1994 and 1995.
A tool for building and running feed-forward neural networks. This
had a simple network specification language and provided a command
line interpreter for training and using the network. 1990.
A package for the implementation of genetic algorithms. This was
very general and very fast. Contained many operators for crossover,
mutation and selection, allowed the user to add more, these could be
selected from the command line, did good logging, had many options
for output. Written in C in 1991-1994 and used by many people on the
internet.
An implementation of Kaufmann's NK landscapes. This employed an
algorithm I devised based on a "jump table" to randomize
behavior. This was a difficult nut to crack, I had made several
earlier attempts. In the end my algorithm used linear space (all
previous solutions I'm aware of had used space that grew
exponentially with k), which allowed for landscapes with very large
k to be constructed. In use by various genetic and evolutionary
algorithm researchers around the world. Written in 1994 in C.
I have fixed two problems in GNU emacs. Firstly, The long-standing
problem of sensitivity to command-line option ordering. My solution
was to sort the command line - emacs is probably one of the only
programs that sorts its command line. This function is the called on
the first line of executable code in emacs' main() function. I also
provided code to examine the emacs load-path variable to detect
instances of elisp files shadowing each other. This alerts people
installing new versions of emacs to potential conflicts from
previously installed site elisp. This function is run automatically
as the last thing that is done whenever emacs is installed. I thus
wrote the (current) first line of executable code in emacs and the
last code that is executed when emacs is installed. Written in C and
emacs lisp in 1994, 1995.
A package allowing the simple treatment of ranges of real numbers.
This allowed a program to efficiently manage a set of real intervals
and test for membership, insert holes, etc. Useful for other
programs doing things like printing a range of pages from a document
and needing to parse a range specification from the command line,
and then begin to use it. Written in C in 1995.
A system for the automatic monitoring of topics of interest in a
news feed. Maintained a set of regular expressions on a per-user
basis, fetched new news from a server via NNTP, and built HTML for
the user to allow the retrieval of articles of interest. The system
contained over 20 administrator and user commands. This was used to
process an incoming news feed of 130,000 articles a day. Written in
perl and C in 1995 at the University of New South Wales.
wwwtable - a perl program for the production of HTML tables. Described
at http://fluidinfo.com/wwwtable/ in use by many people and apparently
included in a Japanese Linux distribution. Written in perl in 1996.
The spamometer, a program for detecting incoming spam mail. Described in
http://fluidinfo.com/spamometer/. Written in perl in 1997. This took a
probabilistic approach, and suggested Bayesian filtering (which I did not
implement), and had a plugin architecture allowing others to write spam
tests that returned probabilities. It also pre-dates the famous
Paul Graham
A Plan for Spam article
and the Spam Assassin (which has a similar approach) by 5
years. Gary Robinson (Spam Assassin
author) mentions the Spamometer as
"the
earliest spam filter work I've heard of" and it is
mentioned in the page on the
pre-history of Spam
Assassin. The Spamometer is mainly of historical interest now,
especially for its potential use as prior art in preventing patents.
Htm4l, a set of over 500 m4 macros for the production of HTML.
Heavily used by Teclata S.L. in Spain and by various other people
around the world. An early version is described in
http://fluidinfo.com/htm4l/htm4l/. Written in m4
during 1996-1998.
A collection of utilities for getting web pages, analyzing their
contents, submitting them to search engines, tricking search
engines, analyzing web logs, checking web page links, etc.