Abstract URL System

a layer underneath the ArsDigita Community System by Philip Greenspun and Jon Salz

Tcl procedures: /packages/acs-core/ad-abstract-url-procs.tcl

The Problem

The main engineering ideas behind the ArsDigita Community System are (1) data models, (2) sequences of URLs that lead up to transactions, and (3) the specifications for those transactions.

We need to increase the amount of abstraction in specifying the URLs.

Right now (February 2000), we happen to use AOLserver and one of the following kinds of pages:

a file
a .adp template
a .spec file that implies further evaluation of templates
a lot of files containing things like JPEGs or videos where there is no practical opportunity for interpretation by the server

Think about it: when the SAP guys started up in 1972 they probably did a of of things for which they are now sorry. In 30 years we will probably still have some vestiges of our data model and workflow. But the specific languages and systems being used today will likely change. In fact, we've already talked about building versions of the ACS that (a) run inside Oracle using their embedded Java Web server, (b) run with Microsoft Active Server Pages, (c) run inside Apache mod_perl. If a publisher swaps out AOLserver for one of these other systems or if we, in an ACS version upgrade, swap in .spec templating, why should the user have to update his bookmarks?

The Solution

We register a procedure that will, given a URL with no extension, dig around in the file system to find the right files to deliver/execute. This is analogous to what AOLserver already does when it gets a directory name. There is also an Apache module that does some of this (see http://www.apache.org/docs/content-negotiation). Here's an example of the algorithm:

Is there a .spec file, indicating usage of the super-whizzy templating system? If so, evaluate it. If not, proceed to next step.
Is there a file, indicating old-style code or code that will look for a .adp template? If so, evaluate it. If not, proceed to next step.
Does the user's session indicate that he or she wants WML for a wireless device? If so, try to find a .wml file and serve it. If no session info or no .wml file, proceed to next step.
Look for a file
Look for a .txt file
Look for a .jpeg
Look for a .gif

Right now we implement a subset of this. The current algorithm (sure to be enhanced in the near future as we add support for scoping and rethink templates) is as follows:

If the URL specifies a directory but doesn't have a trailing slash, append a slash to the URL and redirect (just like AOLserver would).
If the URL specifies a directory and does have a trailing slash, append "index" to the URL (so we'll search for an index.* file in the filesystem).
If the file corresponding to the requested URL exists (probably because the user provided the extension), just deliver the file.
Find a file in the file system with the provided URL as the root (i.e., some file exists which is the URL plus some extension). Give precedence to extensions specified in the ExtensionPrecedence parameter in the abstract-url configuration section (in the order provided there). If such a file exists, deliver it.
The requested resource doesn't exist - return a 404 Not Found.

We are likely to add some steps at the very beginning of this to perform scoping, e.g. check if the URL begins with a group name (and optional group type), and if so set scope variables in the environment and munge the URL accordingly.

Note that we perform a lookup even if a URL with an extension is provided. This is so we can eventually perform content negotation even within the content-type domain, e.g. serve up a document in French (foobar.fr) or the King's English (foobar.en.uk) as opposed to the default Yankeespeak (foobar or foobar.en.us) depending on the browser's Accept-Language setting.

Open questions:

Is there any value in abstracting URLs for big ugly binary files such as JPEG, video, PowerPoint, Word docs, etc.? (I think so - this enables us to change resource types more easily, i.e. replace GIFs with JPEGs or Word documents with HTML files, which is a primary goal of this system in the first place. Our ultimate goal should be the removal of all extensions from URLs throughout ACS. -JS)
Is it worth caching all of these file system probes? (My gut reaction is that it is not; caching will take place in the OS's file system layer anyway, and it would be tricky, although not that tricky, to properly support the addition/removal of files from the file system without explicitly flushing the caches. In any case, caching is not part of the current implementation although it could certainly be added in a future version. -JS)

Minor Benefits:

Tim Berners-Lee will be happy; he doesn't like to see extensions in URLs
People who are language bigots and prefer (Perl|Java|Lisp|C) to Tcl will not be put off by the mere URLs

philg@mit.edu

jsalz@mit.edu