Site Home : Research : One Article
"Owing to the neglect of our defences and the mishandling of the German problem in the last five years, we seem to be very near the bleak choice between War and Shame. My feeling is that we shall choose Shame, and then have War thrown in a little later, on even more adverse terms than at present."Winston Churchill in a letter to Lord Moyne, 1938 [Gilbert 1991]
Don't hold your breath.
The basic idea of the Web goes back at least to 1969. Douglas Engelbart, the inventor of the mouse, developed an integrated system for computer-supported cooperative work and demonstrated it to 2000 people in a San Francisco hotel ballroom. One corner of the display contained a live video image of his collaborator, sitting at a computer terminal 30 miles away. The computer combined input from the two men's keyboards and mice to build hypertext documents and graphics. Although the system was implemented on a timesharing mainframe computer, Engelbart showed his ARPAnet interface and distributing the collaborators across the network was the obvious next step.
[Reference: Computer Supported Cooperative Work conference 1994 videotape, ACM]
Academic computer scientists reimplemented Engelbart's ideas numerous times in the 1970s, 1980s, and 1990s, but none of them caught on. Either the code didn't really work, the researchers concerned themselves with publishing a journal paper rather than making a useful system, the system required a particular brand of computer, or there weren't enough people on the Internet to make it worth trying. A reimplementation of Engelbart's ideas by a group of C programmers at CERN, a physics lab in Switzerland, grew into the World Wide Web fad. The Web caught on like wildfire after the release of NCSA Mosaic, a program that made Web pages with pictures viewable on several different kinds of computers. The fact that we'd lost all of Engelbart's collaboration tools didn't bother anyone; where he had vector graphics we now had bitmap graphics (sort of; see below).
The syntax of Hyper Text Markup Language (HTML), the lingua franca of the Web, is derived from Structured Generalized Markup Language (SGML), which facilitates information exchange among databases. The fundamental idea of SGML is that instead of a document only a human can read, you mark up your document with machine-readable tags, i.e., sematic markup. For example, a corporate earnings report might look like this
<revenue>4,700</revenue> <expenses>4,400</expenses> <profit>300</profit>HTML superficially looks the same, but the tags don't have anything to do with the content of the document. Instead of the SGML
<product>blue jeans</product> <priceUSD>25</priceUSD>we have, in HTML,
blue jeans <b>$25</b>which tells the browser to put the price in boldface but contains nothing that might help a computer program. The builders of the Web picked a structure language (SGML) but then forgot to put in any structure tags. The impoverished set of tags they chose for HTML prevents browsers from doing anything with Web documents other than formatting them.
What's wrong with that? There are plenty of formatting languages, e.g., PostScript, and nobody complains that they don't have semantic markup. Maybe that's because most formatting languages allow you to format documents readably.
A naive mind might assume that HTML was designed by a bunch of people who sat down with 100 documents of different types and said "let's not leave the table until we've put enough richness into this language to capture the authors' and designers' intent for at least 98 of the documents." Ever since the advent of the programming language C, however, this is not how software is designed. Instead of asking "how can we fulfill user requirements?" C programmers ask "how many of the features that were commonly available in 1970 can I add to my program without it crashing (too often)?"
The result? A formatting language too wimpy even for a novel.
Crack open a copy of The English Patient [Ondaatje 1992]. Although its narrative style is about as unconventional as you'd expect for a Booker Prize winner, it is formatted very typically for a modern novel. Sections are introduced with a substantial amount of whitespace (3 cm), a large capital letter about twice the height of the normal font, and the first few words in small caps. Paragraphs are not typically separated by vertical whitespace as in most Web browsers but by their first line being indented about three characters. (This makes dialog much easier to read than on the Web, by the way, where paragraph breaks cut huge gaps between short sentences and break the flow of dialog.) Chronological or thematic breaks are denoted by vertical whitespace between paragraphs, anywhere from one line's worth to a couple of centimeters. If the thematic break has been large, it gets a lot of whitespace and the first line of the next paragraph is not indented. If the thematic break is small, it gets only a line of whitespace and the first line of the next paragraph is indented.
The English Patient is not an easy book to read in paperback. It would become, however, a virtually impossible book to read on the Web because neither the author's nor the book designer's intents are expressible in HTML. Two $30,000 workstations talking to each other over a $10,000 T3 link can't deliver a document to a reader as well as the simplest word processor.
HTML's designers respond to these criticisms by saying "It's a structure language, not a formatting language. Users can set up their browsers to format documents however they like. Besides, we've been thinking about coming out with style sheets for several years now." If HTML really were a structure language instead of merely having the appearance of one, there would be a "thematic break" tag that would let browsers format documents readably. Even if there were enough structure in HTML, is it really efficient for each of 20 million people to spend five minutes designing a document badly when one professional could have spent a few days doing the job right?
As for the style sheets and rest of the evolving "standards" from the Web committees, it remains to be seen whether anyone will be speaking standard HTML in the long run. Less than a year after the Web emerged as a consumer service, the crippled nature of HTML 2.0 lead to a Tower of Babel. Scientists added hyperlink anchors to TeX so that they could transmit formulae and view their documents with custom browsers. Document experts escaped into Adobe's PDF format. Netscape added private extensions HTML to facilitate graphic design and advertising so now one can't look at most Web sites without using the Netscape browser.
HTML is where the Web starts and probably where it will end. The decision to choose a structure language would have been OK even if it meant pathologically ugly pages. We would have been paid back in convenience and automated systems doing our work for us. But the Web was doomed when the C programmers at CERN forgot to add any structure tags. They chose shame and got war.
Computer monitors have a non-linear response to the input voltage. If you regard an image color as a number between 0 (black) and 1 (white), a typical monitor behaves as though it exponentiates the color to the power of 2.5 before displaying it. Thus, a medium grey of 0.5 in an image turns into 0.5^2.5 = 0.177, almost black, on the screen. It is very easy to correct for this. You just adjust the input pixels by exponentiating them to the power of 1/2.5 before feeding them to the graphics system. This is known as gamma correction and is done by every Macintosh or Silicon Graphics computer.
Most Unix boxes and PCs don't bother with this frill though. They figure that you paid $20,000 for a machine that can't do word processing or $4000 so that you could run Microsoft software and therefore you probably aren't smart enough to notice that half of your pictures are lost in black shadows.
Professional graphic artists generally produce Web sites on
Macintoshes. They slave over the images until the mid-tones look just
right, mid-tones that will be mapped almost to black on a typical Sun
workstation or PC. The solution to this problem is trivial: a
gamma=
field in the IMG HTML tag. Given the gamma for
which the image was originally targeted and the characteristics of the
display on which the image is being displayed, the adjustment process
is speedy and easily implemented, even in C. Why didn't the designers
of the Web bother to insert the one line in the standard that would
have facilitated this? Because they were C programmers and C
programmers are so cheap now that their employers don't feel like
spending the extra money to buy them 24-bit displays ("be glad you
have a job; if you complain, we'll hire a Bangladeshi over the
Internet to replace you and pay him 2 bags of rice/week"). These poor
souls had never seen an image look good on a computer screen and hence
thought that standards didn't matter.
You might think that these are the ravings of an aesthete, someone who labors under the misconception the Web was built for displaying high-quality art to a discerning audience. What does it matter if a few artistic photos are distorted when what the Web is really about is cramming product brochures and catalogs down peoples' throats?
Well, catalog merchants go to extraordinary lengths to ensure color and tone fidelity in their printed catalogs. Kodak even makes special film for catalog photography and keeps it in its product line years after it has become obsolete because printers understand its spectral properties. Merchant who spend millions on their catalog Web sites may be overjoyed to watch the goods fly out the door, but they'll be dismayed to find half of them fly right back because the delivered product didn't match the shade of mauve consumers saw on screen.
If the 3270 design appeals to you, take a trip down to the Computer Museum or load up Netscape. The forms user interface model fell into the shade after 1984 when the Macintosh "user drives" pull-down menu system was introduced. However, the Web provides balm for your nostalgia. HTML forms work exactly like the good old 3270.
There is a stampede of people rushing to program in Java right now. Java takes some features of Lisp from 1960 (e.g., automatic storage allocation and garbage collection) and some features from SmallTalk circa 1975 (e.g., object classes) and combines them with C syntax so that the current crop of programming drones doesn't get too bad a shock when adapting to these 20 or 35 year-old "innovations".
A naive person would probably ask "if I let an arbitrary program from the network run on my computer, what stops that program from getting into my personal files, snooping around my local network, etc.?" Sun Microsystems assures you that its Java engineering staff has thought of every contingency and that Java is completely safe. This is the same company that was unable to make their operating system's mailer secure. Thus, Robert Morris, a graduate student at Cornell, was able to write a simple program that took over every Sun Unix workstation on the Internet (plus most other Unix boxes, except for Digital's).
If you trust a Unix vendor to assure your security, please send me email. I have some waterfront property in Florida that I would like to sell you.
Academic computer scientists understood these problems decades ago, of course, and published papers on the subject. They like to remind Web weenies of this fact: "Of course, any network system will come to a halt if you don't have caching [so that 1000 copies of the same thing aren't expensively fetched 1000 times] and naturally the right way to find resources is to have a network-wide name that gets mapped by a distributing naming system to the closest available server. Didn't you read my paper in the ACM Journal of Network Algorithms in 1972?"
They say this in the same patronizing tones that they reserve for Bill Gates and Microsoft. It is true that systems that people use today would be much better if their builders had read the papers published by academicians in the 1960s and 70s. This is a little unfair to the practitioners, though, because nobody has ever shown that academic computer science journal articles have any effect on either practical computer programs or even on academic computer science research. Programmers tend to write programs that are natural extensions of systems that they've personally used, not systems described in a 10 page paper they've read. Apple and Microsoft programmers proudly add multitasking to their operating systems in the 1990s, unaware that their "innovation" was common commercial practice in the 1960s. University researchers in the 1990s build little tools for processing images, unaware that graphic artists in the 1980s were getting vastly better results with PhotoShop.
Garfinkle, Weise, and Strassman 1994. Unix Haters. IDG Books.
Martin Gilbert 1991. Churchill A Life Henry Holt & Company, New York, page 595
Ondaatje, Michael 1992. The English Patient. Vintage International, New York