"Owing to the neglect of our defences and the mishandling of the German problem in the last five years, we seem to be very near the bleak choice between War and Shame. My feeling is that we shall choose Shame, and then have War thrown in a little later, on even more adverse terms than at present."Winston Churchill in a letter to Lord Moyne, 1938 [Gilbert 1991]
If you asked a naive user what the Web would do for them, they'd probably say "I could ask my computer to find me the cheapest pair of blue jeans being sold on the Internet and 10 seconds later, I'd be staring at a photo of the product and being asked to confirm the purchase. I'd see an announcement for a concert and click a button on my Web browser to add the date to my calendar; the information would get transferred automatically."
We computer scientists know that the Web doesn't actually work this way for naive users. Of course, armed with 20 years of Internet experience and the latest in equipment and software, we computer scientists go out into the Web and... fall into exactly the same morass. When we find a conference announcement, we can't click the mouse and watch entries show up in our electronic calendars. We will have to wait for computers to develop natural language understanding and common sense reasoning. That doesn't seem like such a long way off until one reflects that, given the ability to understand language and reason a bit, the computer could go to college for four years and come back capable of taking over our job.
Recently adopted HTML styles sheets offer us a glimmer of hope on the formatting front. It may yet be possible to render a novel readably in HTML. However, style sheets can't fix all of HTML's formatting deficiencies and certainly don't accomplish anything on the semantic tagging front.
Increased formatting capabilities are fundamentally beneficial. It is more efficient for one person to spend a few days formatting a document well than for 20 million users to each spend five minutes formatting a document badly. Yet the original Web model was the latter. Users would edit resource files on the Unix machines or dialog boxes on their Macintosh to choose the fonts, sizes, and colors that best suited their hardware and taste. Still, when they were all done, Travels with Samantha and the Bible ended up looking more or less the same.
The Netscape extensions ushered in a new era of professional document design, but it hasn't all been for the best, especially the round introduced with Netscape 2.0. HTML documents may have looked clunky back in 1993 but at least they all worked the same. Users knew that if they saw something black, they should read it. If they saw something gray, that would be the background. If they saw something blue, they should click on it. Unlike CD-ROMs, web sites did not have sui generis navigation tools or colors that took a few minutes to learn. Web sites had user interface stability, the same thing that made the Macintosh's pull-down menus so successful (because the print command was always in the same place, even if it was sometimes grayed-out).
Netscape 1.1 allowed publishers to play with the background, text, link, and visited link colors. Oftentimes, a graphic designer would note that most of the text on a page was hyperlinks and therefore just make all the text black. Alternatively, he or she would choose a funky color for a background and then three more funky colors for text, link, and visited link. Either way, users have no way of knowing what is a hyperlink and what isn't. Oftentimes, designers get bored and change these colors even within a site.
Very creative publishers managed to use the Netscape 1.1 extensions to create Web documents that looked like book or magazine pages. They did this by dropping in thousands of references to transparent GIFs, painful for them but even more painful for the non-Netscape-enhanced user.
Frames, introduced with Netscape 2.0, give the user the coldest plunge into unfamiliar user interface yet. The "Back" button no longer undoes the last mouse click; it exits the site altogether. The space bar no longer scrolls down; the user has to first click the mouse in the frame containing the scroll bar. Screen space, the user's most precious resource, is wasted with ads, navigation "aids" that he has never seen before, and other items extraneous to the requested document.
Thanks to all of these Netscape extensions, the Web abounds with multi-frame, multi-color, multi-interfaced sites. Unfortunately, it still isn't possible to format a novel readably. I'll use The English Patient [Ondaatje 1992] as an example. Although its narrative style is about as unconventional as you'd expect for a Booker Prize winner, it is formatted very typically for a modern novel.
Sections are introduced with a substantial amount of whitespace (3
cm), a large capital letter about twice the height of the normal font,
and the first few words in small caps. Paragraphs are typically
separated by their first line being indented about three characters.
Chronological or thematic breaks are denoted by vertical whitespace
between paragraphs, anywhere from one line's worth to a couple of
centimeters. If the thematic break has been large, it gets a lot of
whitespace and the first line of the next paragraph is not indented.
If the thematic break is small, it gets only a line of whitespace and
the first line of the next paragraph is indented. So the
"author's intent" needs to be expressed with tags like
<small-thematic-break>
. The "designer's intent"
needs to be expressed with equations like small-thematic-break =
one line of whitespace
.
Style sheets, officially adopted as a standard on March 5, 1996 by most browser makers, make this possible in almost the manner I've described. I asked Hakon W. Lie, one of the authors of the style sheet proposal, for the most tasteful way to format The English Patient. He came back with the following:
<STYLE> P { text-indent : 3em } P.stb { margin-top: 12pt } P.mtb { margin-top: 24pt; text-indent : 0em} P.ltb { margin-top: 36pt; text-indent : 0em} </STYLE> <P CLASS=stb>Sample of small thematic break <P>just an ordinary paragraph <P CLASS=mtb>Sample of medium thematic break <P CLASS=ltb>Sample of large thematic break
The cascading style sheet proposal that was ultimately successful rejected the idea of new tags because a document marked up with such tags would not have been valid under the HTML document type definition (DTD).
Is the formatting problem solved? I begged for style sheets in my August 1994 paper and now we have them, much better thought-out and more powerful than I envisioned. The author/designer intent split is captured nicely. So what is left to do? Style sheets don't let one publish mathematics, figures with captions, or dozens of other things faciliated by old languages like LaTeX or newer systems like Microsoft Word.
We could just add hyperlinks to LaTeX. This is more or less what a group of people at Los Alamos National Labs did a few years ago. I don't think there are really sound intellectual arguments against this approach, but sentiment seems to be on the side of keeping HTML. If we are indeed stuck with HTML, though, perhaps there is a better way to extend it.
Our methodology for extending HTML seems to be the following
I'd like to suggest an alternative approach:
An obvious reason why this wouldn't work is that the committee could never think of all the useful fields. Five years from now, people are going to want to do new, different, and unenvisioned things with the Web and Web clients. Thus, a decentralized revision and extension mechanism is essential for a structure system to be useful.
A deeper reason why this wouldn't work is that nobody would be able to write parsers and user interfaces for it. If a user is developing a Web document, does he want to see a flat list of 10,000 fields and go through each one to decide which is relevant? If you are programming a parser to do something interesting with Web documents, do you want to deal with arbitrary combinations of 10,000 fields?
meeting-announcement
. Fields such as
to, from, cc,
and subject
are inherited from
the base class message
. Fields such as
meeting-place
are associated with the class
meeting-annoucement
itself.
Each message type also has an associated list of suggested types for a
reply message. For example, the suggested reply type for
meeting-announcement
is
request-for-information
. Most importantly, the
decomposition of message types into a kind-of hierarchy allows the
automatic generation of helpful user interfaces. For example, once
the system knows that the user is writing a
lens-meeting-announcement
, that determines which fields
are offered for filling and what defaults are presented. Fields
having to do with software bugs or New York Times articles are not
presented and fields such as place
and time
may be helpfully defaulted with the usual room and time.
What did Malone's team learn from this?
article
date
was prior to today
.
META
element. META
tags go in the head of
an HTML document and include information about the document as a
whole. For example
<meta name="type" content="conference-announcement">
<meta name="conference-name" content="WebNet-96">
<meta name="conference-location-brief" content="San Francisco">
<meta name="conference-location-full" content="Holiday Inn Golden Gateway Hotel, San Francisco, California, USA">
<meta name="conference-date-start" content="16 October 1996">
<meta name="conference-date-end" content="19 October 1996">
<meta name="conference-papers-deadline" content="15 March 1996">
<meta name="conference-camera-ready-copy-deadline" content="1 August 1996">
would be part of the description for our conference and provides
enough information for entries to be made automatically in a user's
calendar.
It might not be pretty. It might not be compact. But it will work without causing any HTML level 2 client to choke.
There are a few obvious objections to this mechanism. The most
serious objection is that duplicate information must be maintained
consistently in two places. For example, if the conference organizers
decide to change the papers deadline from 15 March to 20 March,
they'll have to make that change both in the META
element in the HEAD
and in some human-readable area of the BODY
.
An obvious solution is to expose the field names and contents to the reader directly, as is typically done with electronic mail and as is done in [Malone 1987]. When Malone added semiformal structure to hypertext [Malone 1989], he opted to continue exposing field names directly to users. However, that is not in the spirit of the Web; stylistically, the best Web documents are supposed to read like ordinary text.
A better long-term solution is a smart editor for authors that
presents a form full of the relevant fields for the document type and
from those fields generates human-readable text in the
BODY
of the document. When the author changes a field,
the text in the BODY
changes automatically. Thus, no
human is ordinarily relied upon to maintain duplicate data.
Whatever mechanism we propose, therefore, had better allow for an organization to develop further specialized types that facilitate clever processing and presentation. At the same time, should one of these hyperspecialized documents be let loose on the wider Internet, it should carry some type information understandable to unsuspecting clients. Once mechanism for doing this is the inclusion of an extra type specification:
<meta name="type" content="lanl-acl-conference-announcement">
<meta name="most-specific-public-type" content="conference-announcement">
In this case, the Los Alamos National Laboratory's Advanced Computing
Laboratory has concocted a highly specialized type of conference
announcement that permits extensive automated processing by Web
clients throughout Los Alamos. However, should someone at MIT be
looking at the conference announcement, his Web client would fail to
recognize the type lanl-acl-conference-announcement
and
look at the most-specific-public-type
field. As
conference-announcement
is a superclass of
lanl-acl-conference-announcement
, all the things that the
MIT user's client is accustomed to doing with conference announcements
should work with this one.
Nonhierarchical inheritance (also known as "multiple inheritance") is
also important so that duplicate type hierarchies are not spawned.
For example, the fact that a document is restricted to a group or
company might possibly apply to any type of document. Should
there be two identical trees, one rooted at
basic-document
and the other at
basic-internal-document
? Then we might imagine documents
for which there is an access charge. Now we just need four identical
trees, rooted at basic-free-document, basic-metered-document,
basic-internal-free-document, basic-internal-metered-document
. There
is a better way and it was demonstrated in the MIT Lisp Machine Flavor
system (a Smalltalk-inspired object system grafted onto Lisp around
1978): mixins. Mixins are orthogonal classes that can be combined in
any order and with any of the classes in the standard kind-of
hierarchy. Here are some example mixin classes:
Class Name | Fields Contributed | Comments |
---|---|---|
draft-mixin
|
| User Agent displays "****DRAFT****" prominently, offers to look up previous version and show change bars. |
restricted-mixin
|
(explains who can access,
possibly a domain name or list of networks)
| HTTP server watches for documents whose type inherits from this class and only delivers them to authorized users; non-authorized users sent an explanation with the name of a person who could authorize release. |
If there are N mixins recognized in the public type registry, we might
have to have 2^N classes for every class in the old kind-of hierarchy.
That's one for every possible subset of mixins, so we'd have classes
like travel-magazine, travel-magazine-restricted,
travel-magazine-draft, travel-magazine-draft-restricted
, etc. This
doesn't seem like a great improvement on the 2^N identical trees
situation.
However, if we allow documents to specify multiple types
<meta name="types" content="travel-magazine restricted-mixin draft-mixin">and build the final composite type at runtime in the content editor, HTTP server, and Web user agent, then we need only have one hierarchy plus a collection of independent orthogonal mixins. This presents no problem for programmers using modern computer languages such as Smalltalk and Common Lisp. These allow new type definitions at run-time and have had multiple inheritance for over a decade. A program implemented in a language that has purely static types, e.g, C++ or Java, is going to need to include its own dynamic type system, built from scratch and not based on the underlying language's type system.
We established then that we need multiple inheritance and distributed extensibility. A standard Internet approach to distributed maintenance of a hierarchy is found in the Domain Name System (DNS), where authority for a zone is parcelled out and that authority includes the ability to parcel out subzones [Stevens 1994; Mockapetris 1987a ; Mockapetris 1987b].
DNS-style type definition service might seem like overkill initially and would result in delays for pioneer users of document types. Without a substantial local cache, document type queries would have to be sent across the Internet for practically every Web document viewed. An alternative would be to have documents include their type definition code at the top or reference a URL where such a definition might be found. This is how it is done with style sheets.
Regardless of how the hierarchy is maintained, developing the initial core taxonomy is a daunting task. The taxonomies developed by librarians are only a partial solution because they do not generally concern themselves with the sorts of ephemera that constitute the bulk of Internet traffic. If we don't get the core taxonomy right, we won't reap the benefits of useful standard software.
Lie, H.W., Bos, B. 1996. "Cascading Style Sheets, W3C Working Draft" (http://www.w3.org/pub/WWW/TR/WD-css1.html)
Malone, Thomas W., Grant, Kenneth R., Lai, Jum-Yew, Rao, Ramana, and Rosenblitt, David 1987. "Semistructured Messages are Surprisingly Useful for Computer-Supported Coordination." ACM Transactions on Office Information Systems, 5, 2, pp. 115-131.
Malone, Thomas W., Yu, Keh-Chaing, Lee, Jintae 1989. What Good are Semistructured Objects? Adding Semiformal Structure to Hypertext. Center for Coordination Science Technical Report #102. M.I.T. Sloan School of Management, Cambridge, MA
Mockapetris, P.V. 1987a. "Domain Names: Concepts and Facilities," RFC 1034
Mockapetris, P.V. 1987b. "Domain Names: Concepts and Facilities," RFC 1035
Ondaatje, Michael 1992. The English Patient. Vintage International, New York
Stevens, W. Richard 1994. TCP/IP Illustrated, Volume 1: The Protocols. Addison-Wesley, Reading, Massachusetts