So You Want to Run Your Own Server

Pig racing at the New Jersey State Fair 1995. Flemington, New Jersey.

Chapter 8: So You Want to Run Your Own Server

by Philip Greenspun, part of Philip and Alex's Guide to Web Publishing

Revised May 2008

Playing blackjack in Atlantic City (New Jersey)

At the risk of alienating nerd readers, let's take a paragraph to explain why you need a Web server. Suppose that you have developed the world's most appealing Web site. It is a collection of .html files on your laptop computer and viewing it in Netscape is delightful. This won't do other folks on the Internet much good, though. You need to transfer those files to a computer that is permanently attached to the Internet. That computer, known as a "Web server", needs to run a program, also conveniently known as a "Web server", that waits for readers (clients) to request a page. When a request comes in, the Web server program digs around in the computer's file system and delivers the requested page back to the client over the Internet. Only at this point do you have a public Web site.

Why This is Hard

If you reflect on your own Web surfing experience, you will recall many times when sites you wanted to visit were unavailable. At other times, sites that were available turned out not to be worth visiting. Fifty years after the development of the digital computer and thirty years after the development of the Internet, why is it that we so seldom encounter Web services where everything seems to be as it should be? Too many layers. Operating a Web service requires attention to the following layers:

idea
money
content | hype | users | customer service
business/tech split
service-specific software
toolkit on top of RDBMS
relational database management system (e.g., Oracle or SQL Server)
operating system (e.g., Windows XP or Unix)
TCP/IP
power

The technical layers require monitoring, care, and feeding 24x7x365. The business layers require dedicated personnel. It is beyond the scope of this chapter to talk about the publisher/business section of the layercake. We'll restrict ourselves to the lower layers.

There are three levels at which you can take responsibility for the technical layers of your public Web site:

You are a user of a remote machine.
You are the owner/administrator of a machine inside someone else's network.
You are the owner/administrator of a machine inside your own network.

Being a User on a Remote Machine

If your Web site is on a remote machine that someone else administers, your only responsibility is periodically transferring your files there. As soon as the remote server is behaving the way you want it to, you can walk away until it is time to update your site. You can go away on vacation for two months at a stretch. If you need expensive software, such as a relational database, you can simply shop for a site-hosting service that includes use of that software as part of a package.

Most Internet Service Providers (ISPs), including universities, will give customers a free Web site on one of their servers. You won't be able to do anything fancy with relational databases. You will have a URL that includes the ISP's domain name, e.g., http://members.aol.com/netspringr/ or http://www.mit.edu/people/thomasc/. If you are using a commercial ISP, they will usually be willing to sell you "domain-level service" for $50 to $100 a month. So now instead of advertising http://members.aol.com/greenspun/ as your public site, you advertise http://www.greenspun.com. Instead of "lovetochat@aol.com", you can publish an email address of "philip@greenspun.com". Domain-level service gives you a certain degree of freedom from your ISP. You can unsubscribe from AOL and all of the links to your site that you've carefully nurtured over the years will still work.

Even if you have your own domain, there are some potential downsides to having it parked on someone else's physical computer. You are entirely at the mercy of the system administrators of the remote machine. If e-mail for your domain isn't being forwarded, you can't have your own system administrator fix the problem or go digging around yourself--you have to wait for people who might not have the same priorities that you do. If you are building a sophisticated relational database-backed site, you might not have convenient access to your data because you aren't allowed shell access to the remote machine, only the ability to FTP and HTTP PUT files.

If you want your own domain and basic Web hosting, good starting points are Yahoo Domains, GoDaddy, and www.register.com.

Your Machine/Their Network

If you are the owner of a Web-serving computer inside someone else's network you have total freedom to make changes to your configuration and software. You can use the machine for software development or other experiments. Whoever is hosting your box is responsible for network connectivity. If packets aren't getting to your machine, they'll probably notice and do something about it. You don't have to pay the $500 per month cost of a T1 line yourself.

The downside to running your own box is that you have to carefully watch over your computer and Web server program. Nobody else will care about whether your computer is serving pages. You'll have to carry a pager and sign up for a service like Uptime (http://uptime.openacs.org/uptime/) or http://www.redalert.com/ that will email you and beep you when your server is unreachable. You won't be able to go on vacation unless you find someone who understands your computer and the Web server configuration to watch it. Unless you are using free software, you may have to pay shockingly high licensing fees. Internet service providers charge between $250 a month and $4,000 a month for physical hosting, depending on who supplies the hardware, how much system administration the ISP performs, and how much bandwidth your site consumes.

The people who do this usually call the service co-location. The better vendors have terminal concentrator in reverse so you can telnet into one of their boxes and then connect to the serial port on the back of your Unix box, just as though you were at the machine's console. For the Windows crowd, the good colo providers have a system whereby you can visit a Web page and power cycle your machine to bring it back up. If all else fails, they have technicians in their facilities 24x7. Most colo providers have redundant connectivity and private peering with Internet backbone operators. If you go on a tour they'll show you their huge bank of batteries and back-up natural gas generator. If this seems like overkill, here's an email that we got a few days after moving a server from MIT to a colo facility:

From: William Wohlfarth <wpwohlfa@PLANT.MIT.EDU>
Subject: Power Outage August 7 , 1997
Date: Fri, 8 Aug 1997 07:39:43 -0400

At approximately 5:35pm on August 7, 1997 a manhole explosion in the
Kendall Sq area caused Cambridge Electric to lose Kendall Station. MIT
lost all power and the gas turbine tripped. Power was fully restored at
7pm.

At approximately 7:05pm, a second manhole explosion,caused 1 fatality,
and injuries to 4 other Cambridge Electric utilitymen including a
Cambridge Policeman. Putnam Station was also tripped and MIT lost all
power again.

At approximately 10:30pm, MIT had power restored to all buildings within
our distribution system. Several East campus ( E28, E32, E42, E56, E60,
NE43)and North/Northwest buildings( N42, N51/52, N57, NW10, NW12, NW14,
NW15, NW17, NW20, NW21, NW22, NW30, NW61, NW62, W11, WW15) which are fed
directly from Cambridge Electric, were restored by Cambridge Electric
personal.

Cambridge Electric is still sorting out the chain of events. At last
discussions with them, a total of 3 manhole explosions had taken place.
Additional information will be posted when available.

Their Machine/Their Network

The most modern way to support a service is with a percentage of Amazon or Google's cloud of servers. With the Google App Engine or Amazon Elastic Compute Cloud you complete your software to run on an idealized unit of server power within Google or Amazon's data center. If you are featured on network television and you've coded your application so that there is no central database bottleneck, you can scale it up to handle an enormous number of users without investing in hardware. When the fad dies, the virtual servers are returned to Google or Amazon.

Traditional someone else's computer and someone else's network services are provided by hosting.com, rackspace.com, and a range of competitors.

Your Machine/Your Network

If you are the owner of a machine inside your own network you can sit right down at the console of your Web server and do software development or poke through the database. This can be a substantial convenience if you are running an RDBMS-backed Web site and there is no obvious way to have development and production sites. If you are already paying to have a high-speed network connection to your desktop then the marginal cost of hosting a server this way may be zero, a fact that was not lost on university students throughout the early years of the Web. The downside is that you have all of the responsibilities and hassles of owning a server physically inside someone else's network, plus all of the responsibilities and hassles of keeping the network up. Note that for text-oriented Internet services a home DSL connection might be perfectly adequate for supporting a community of a few thousand souls.

Choosing a Computer

Wedding video shoot. World Financial Center. New York City.

Hardware engineers have done such a brilliant job over the last 40 years that nobody notices that, in the world of commercial software, the clocks slowed to a crawl in 1957. Society can build a processor for $50 capable of executing 200 million instructions per second. Marvelous. With computers this powerful, the amazing thing is that anyone still has to go into work at all. Perhaps part of the explanation for this apparent contradiction is that, during its short life, the $50 chip will consume $10,000 of system administration time.

Most people buying a server computer make a choice between Unix and Windows. These operating systems offer important 1960s innovations like multiprocessing and protection among processes and therefore either choice can be made to work. Certainly the first thing to do is figure out which operating system supports the Web server and database management software that you want to use. If the software that appeals to you runs on both operating systems, make your selection based on which computer you, your friends, and your coworkers know how to administer. If that doesn't result in a conclusion, read the rest of this section.

Unix

Hydroelectric plant on the Connecticut River in Vernon, Vermont.

Unix is the ultimate open-source poster child. Unlovely and incompetent at its birth in 1970 at Bell Labs, it ultimately triumphed over every other operating system even while it lacked beloved features. Unix was one of the first operating systems written in a high-level language. This made Unix easy to port to a new processor architecture and it also made Unix easier for non-wizard programmers to extend. What's more, though Unix was not free, Bell Labs made it relatively easy for universities and computer manufacturers to get hold of the source code. Whenever any company built a new processor (this was in the days before the Intel monopoly), it was always easier to run Unix on it than write a new operating system from scratch.

Setting up a Unix server properly requires an expert with years of experience. The operating system configuration resides in hundreds of strangely formatted text files, which are typically edited by hand. One of the good things about a Unix server is stability. Assuming that you never touch any of these configuration files, the server will keep doing its job more or less forever.

Which Brand of Unix?

Cooling water flowing out of the Vermont Yankee nuclear power plant and back into the Connecticut River. These concrete structures are designed to aerate the water and make it more useful to the life of the river. Vernon, Vermont.

The companies making Unix machines were accustomed to building proprietary systems and hence they couldn't resist introducing annoying incompatibilities among their versions of Unix. A program written for Digital Unix wouldn't run on HP Unix, IBM's AIX, or Sun Unix. A 100,000-line program written for Sun's Solaris operating system might require the modification of only 10 lines to run on HP Unix but there was no easy way to tell which 10 lines (note the similarity between the Unix world of the 1980s and 1990s and the current sort-of-compatible cloud of Java application server vendors).

After all was said and done, Sun's version of Unix proved to be the winner just in time for the entire commercial Unix market to be rendered irrelevant by the open source movement.

The Free Software Foundation, started in 1983 by Richard Stallman, developed a suite of software functionally comparable to the commercial Unices. This software, called GNU for "GNU is Not Unix", was not only open-source but also free and freely alterable and improvable by other programmers. Linus Torvalds built the final and most critical component (the kernel) and gave the finished system its popular name of "Linux", although it would be more accurate to call the system "GNU/Linux".

The GNU/Linux world is a complex one with incompatible distributions available from numerous vendors. Here's how an expert described the different options: "People swear by SuSE. Everyone hates RedHat, but they are the market leaders. Debian is loved by nerds and geeks. Mandrake is to give to your momma to install."

Windows

Windows NT crashed.
I am the Blue Screen of Death.
No one hears your screams.
-- Peter Rothman (in Salon Magazine's haiku error message contest)

The name "Microsoft Windows" has been applied to two completely different product development streams. The first was Win 3.1/95/98. This stream never resulted in a product suitable for Internet service or much of anything else. The second stream was Windows NT/2000/XP. Windows NT 3.1 was released in 1993 and did not work at all. NT 3.51 was released in 1995 and worked reasonably well if you were willing to reboot it every week. Windows 2000 was the first Microsoft operating system that was competitive with Unix for reliability. Windows XP contained some further reliability improvements and a hoarde of fancy features for desktop use. Windows .NET Server can be described as "XP for servers" and is just being released now (late 2002).

Windows performance and reliability were so disappointing for so long that Microsoft servers still have a terrible reputation among a lot of older (25+) computer geeks. This reputation is no longer justified however and has been demonstrated to be false by a large number of high-volume Web services that operate from Microsoft servers.

Open-source purists don't like the fact that Windows is closed-source. However, if your goal is to be innovating with an Internet service it is very unlikely that your users will be well served by you mucking around in the operating system source code. The OS is several layers below anything that matters to a user.

GNU/Linux versus Windows

GNU/Linux is good for zero-budget projects. You find a scrap PC, load Mandrake, download an open-source relational database management system such as PostgreSQL, and start coding.

Microsoft Windows is good for zero-time projects. You find someone who knows how to read a menu and click a mouse. That person should be able to put up a basic Web service on an XP or .NET Server machine without doing any programming or learning any arcane languages.

GNU/Linux is good for enormous server farms and extreme demands. The folks who run search engines such as Google can benefit by modifying the operating system to streamline the most common tasks. Also if you have 10,000 machines in a rack the savings on software license fees from running a free operating system are substantial.

Microsoft Windows is good for its gentle slope from launching a static Web site to a workgroup to a sophisticated RDBMS-backed multi-protocol Internet service. A person can start with menus and programs such as Microsoft FrontPage and graduate to line-by-line software development when the situation demands.

GNU/Linux is good for embedded systems. If you've built a $200 device that sits on folks' television sets you don't want to pay Microsoft a $10 tax on every unit and you don't want to waste time and money negotiating a deal with Microsoft to get the Windows source code that you'll need to make the OS run on your weird hardware. So you grab GNU/Linux and bend it to the needs of your strange little device.

In our course "Software Engineering for Internet Applications", about half of the students choose to work with Windows and half with GNU/Linux. The students are seniors at the Massachusetts Institute of Technology, majoring in computer science. As a group the Windows users are up and running much faster and with much more sophisticated services. A fair number of GNU/Linux users have such difficulty getting set up that they are forced to drop out of the course. By the end of the semester the remaining students do not seem to be much affected by their choice of operating system. Their struggles are with the data model, page flow, and user experience.

Final Hardware Selection Note

Whatever server computer you buy, make sure that you get an uninterruptible power supply and mirrored disks on separate controllers. An Internet application need not go offline because of a power glitch or a disk failure.

How Much Capacity?

Rock of Ages factory. Graniteville, Vermont. This was the factory where the Vietnam Memorial was built and inscribed

The typical Web site saga goes as follows:

Publisher spends $500,000 developing site.
Site attracts 50 users per day.
Publisher spends $100,000 on promotion.
Site gets hyperlinked from cnn.com, resulting in five new users per second.
All the new users are greeted with a "server not responding" error message.

Do you size for your usual traffic or for "the big day"? If you have a static Web site, you should always size for the big day. A pizza-box size computer running an efficient threaded server program won't start slowing down until requests reach the 2 million-per-day mark. If you're on your own T1 line, you'll find that 2 million hits per day for 7 KB files, a typical size for a small JPEG image, will just fill it up. That is one reason to park your server at a co-location facility where any extra network demand will be easily handled.

What if you have a database-backed site? If you're doing an SQL query for every page load, don't count on getting more than 500,000 hits a day out of a cheap computer (alternative formulation: no more than 10 requests per second per processor). Even 500,000 a day is too much if your RDBMS installation is regularly going out to disk to answer user queries. In other words, if you are going to support 10 to 20 SQL queries per second, you must have enough RAM to hold your entire data set and must configure your RDBMS to use that RAM. You must also be using a Web/DB integration tool like AOLserver or Microsoft IIS that does not spawn new processes or database connections for each user request.

What if your site gets linked from yahoo.com, aol.com, abcnews.com, and cnn.com ... all on the same day? Should you plan for that day by ordering an 8-CPU Oracle server, a rack of Web servers, and a load balancing switch? It depends. If you've got unlimited money and sysadmin resources, go for the big iron. Nobody hands out prizes in the IT world for efficient use of funds. People never adjust for the effort involved in putting together and serving a site. Consider the average catalog shopping site, which cost the average company between $7 million and $35 million according to a mid-1999 Interactive Week survey. Users connecting to the site never say "My God, what a bunch of losers. They could have had a high-school kid build this with Microsoft .NET and the free IBuySpy demo toolkit in about a week." The only thing that a user will remember is that they placed their order and got their Beanie Baby. Assuming that the site was built by a contractor for a publisher, the only thing that the publisher will remember is whether or not it was built on time.

More: See the "Scaling Gracefully" chapter in Internet Application Workbook (http://philip.greenspun.com/internet-application-workbook/).

Server Software

Once you have bought a Web server computer, you need to pick a Web server program. The server program listens for network connections and then delivers files in response to users' requests. These are the most important factors in choosing a program:

quality of application programming interface (API)
tools for connecting to relational database management systems (RDBMS)
support and source code availability
availability of shrink-wrap plug-in software packages
speed

Each of these factors needs to be elaborated upon.

API

Unless your publishing ambition is limited to serving static files, you will eventually need to write some programs for your Web site. It is quite possible that you'll need to have a little bit of custom software executing every time a user requests any file. Any Web server program can invoke a common-gateway interface (CGI) script. However, CGI scripts impose a tremendous load on the server computer (see the server-side programming chapter). Furthermore, an all-CGI site is less straightforward for authors to maintain and for search engines to search than a collection of HTML files.

A Web server API makes it possible for you to customize the behavior of a Web server program without having to write a Web server program from scratch. In the early days of the Web, all the server programs were free. You would get the source code. If you wanted the program to work differently, you'd edit the source code and recompile the server. Assuming you were adept at reading other peoples' source code, this worked great until the next version of the server came along. Suppose the authors of NCSA HTTPD 1.4 decided to organize the program differently than the authors of NCSA HTTPD 1.3. If you wanted to take advantage of the features of the new version, you'd have to find a way to edit the source code of the new version to add your customizations.

An API is an abstraction barrier between your code and the core Web server program. The authors of the Web server program are saying, "Here are a bunch of hooks into our code. We guarantee and document that they will work a certain way. We reserve the right to change the core program but we will endeavor to preserve the behavior of the API call. If we can't, then we'll tell you in the release notes that we broke an old API call."

An API is especially critical for commercial Web server programs where the vendor does not release the source code. Here are some typical API calls from the AOLserver documentation (http://www.aolserver.com):

ns_passwordcheck user password returns 1 (one) if the user and password combination is legitimate. It returns 0 (zero) if either the user does not exist or the password is incorrect.
ns_sendmail to from subject body sends a mail message

Originally AOLserver was a commercial product. So the authors of AOLserver wouldn't give you their source code and they wouldn't tell you how they'd implemented the user/password database for URL access control. But they gave you a bunch of functions like ns_passwordcheck that let you query the database. If they redid the implementation of the user/password database in a subsequent release of the software then they redid their implementation of ns_passwordcheck so that you wouldn't have to change your code. The ns_sendmail API call not only shields you from changes by AOLserver programmers, it also allows you to not think about how sending e-mail works on various computers. Whether you are running AOLserver on Windows, HP Unix, or Linux, your extensions will send e-mail after a user submits a form or requests a particular page.

AOLserver is now open-source, with source code and a developer's community available from www.aolserver.com.

Aside from having a rich set of functions, a good API has a rapid development environment and a safe language. The most common API is for the C programming language. Unfortunately, C is probably the least suitable tool for Web development. Web sites are by their very nature experimental and must evolve. C programs like Microsoft Word remain unreliable despite hundreds of programmer-years of development and thousands of person-years of testing. A small error in a C subroutine that you might write to serve a single Web page could corrupt memory critical to the operation of the entire Web server and crash all of your site's Web services. On operating systems without interprocess protection, such as Windows 95 or the Macintosh, the same error could crash the entire computer.

Even if you were some kind of circus freak programmer and were able to consistently write bug-free code, C would still be the wrong language because it has to be compiled. Making a small change in a Web page might involve dragging out the C compiler and then restarting the Web server program so that it would load the newly compiled version of your API extension.

By the time a Web server gets to version 2.0 or 3.0, the authors have usually figured that C doesn't make sense and have compiled in an interpreter for Tcl, Visual Basic, Java byte codes, or JavaScript.

RDBMS Connectivity

You've chosen to publish on the Web because you want to support collaboration among users and customize content based on each individual user's preferences and history. You see your Web site as a lattice of dazzling little rubies of information. The Unix or Windows file system, though, only understands burlap sacks full of sod. As you'll find out when you read the chapters on building database-backed Web sites, there aren't too many interesting things that you can implement competently on top of a standard file system. Sooner or later you'll break down and install a relational database management system (RDBMS).

You'll want a Web server that can talk to this RDBMS. All Web servers can invoke CGI scripts that in turn can open connections to an RDBMS, execute a query, and return the results formatted as an HTML page. However, some Web servers offer built-in RDBMS connectivity. The same project can be accomplished with much cleaner and simpler programs and with a tenth of the server resources.

Support and Source Code Availability

Tree branch at Glen Ellis Falls on Rt. 16 in New Hampshire

Most computer programs that you can buy in the 1990s are copies of systems developed in the 1960s. Consider the development of a WYSIWYG word processor. A designer could sit down in 1985 and look at ten existing what-you-see-is-what-you-get word processors: Xerox PARC experiments from 1975, MacWrite, workstation-based systems for documentation professionals (such as Interleaf). He would not only have access to the running programs but also to user feedback. By 1986 the designer hands off the list of required features to some programmers. By 1987, the new word processor ships. If enough of the users demand more sophisticated features, the designers and programmers can go back to Interleaf or Frame and see how those features were implemented. Support consists of users saying "it crashes when I do x," and the vendor writing this information down and replying "then don't do x." By 1989, the next release of the word processor is ready. The "new" features lifted from Interleaf are in place and "doing x" no longer crashes the program.

Did this same development cycle work well for Web server programs? Although the basic activity of transporting bits around the Internet has been going on for three decades, there was no Web at Xerox PARC in 1975. The early Web server programs did not anticipate even a fraction of user needs. Web publishers did not want to wait years for new features or bug fixes. Thus an important feature for a Web server circa 1998 was source code availability. If worst came to worst, you could always get an experienced programmer to extend the server or fix a bug.

At the time of this revision, November 2002, two factors combined to make source code availability irrelevant for most Internet application developers: (1) Web server programs today address all the most important user needs that were identified between 1990 and 2001, and (2) the pace of innovation among publishers and developers has slowed to the point where commercial software release cycles are reasonably in step with user goals.

If you're building something that is unlike anything else that you've seen on the public Internet, e.g., a real-time multiuser game with simultaneous voice and video streams flying among the users, you should look for a Web server program whose source code is available. If on the other hand you're building something vaguely like amazon.com or yahoo.com it is very likely that you'll never have the slightest interest in delving down into the Web server source.

Availability of Shrink-wrap Plug-ins

Are your ideas banal? Is your Web site like everyone else's? If so, you're a good candidate for shrink-wrapped software. If Microsoft Sharepoint is the right toolkit for your users you'll want to be running Microsoft IIS.

Speed

It is so easy now to get a high-efficiency server program that speed is no longer a significant discriminant. In ancient times, the Web server forked a new process every time a user requested a page, graphic, or other file. The second generation of Web servers pre-forked a big pool of processes, e.g., 64 of them, and let each one handle a user. The server computer's operating system ensured that each process got a fair share of the computer's resources. A computer running a pre-forking server could handle at least three times the load. The latest generation of Web server programs uses a single process with internal threads. This has resulted in another tripling of performance.

It is possible to throw away 90 percent of your computer's resources by choosing the wrong Web server program. Traffic is so low at most sites and computer hardware so cheap that this doesn't become a problem until the big day when the site gets listed on the Netscape site's What's New page. In the summer of 1996, that link delivered several extra users every second at the Bill Gates Personal Wealth Clock (http://philip.greenspun.com/WealthClock). Every other site on Netscape's list was unreachable. The Wealth Clock was working perfectly from a slice of a 70 Mhz pizza-box computer. Why? It was built using AOLserver and cached results from the Census Bureau and stock quote sites in AOLserver's virtual memory.

Given the preceding criteria, let's evaluate the three Web server programs that are sensible choices for a modern Web site (see "Interfacing a Relational Database to the Web" for a few notes on why the rest of the field was excluded).

AOLserver

America Online is gradually converting its services from proprietary protocols to HTTP. When you have 35 million users, life begins at 50 million hits/day/service and you don't have too much patience for bogus "scalable application server" products. Back in 1995, a small company in Santa Barbara called Navisoft was making the most interesting Web server program. America Online bought the whole company just so that they could use the product in-house. They changed the name of the program to AOLserver and cut the price of the program from $5000 to $0.

AOLserver has a rich and comprehensive set of API calls. Some of the more interesting ones are the following:

ns_sendmail (sends e-mail)
ns_httpget (grabs a Web page from another server)
ns_schedule_daily (specifies a procedure to be run once a day)

These kinds of API calls let you write sophisticated Web/e-mail/database systems that are completely portable among different operating systems.

These are accessible from C and, more interestingly, from the Tcl interpreter that they've compiled into the server. It is virtually impossible to crash AOLserver with a defective Tcl program. There are several ways of developing Tcl software for the AOLserver but the one with the quickest development cycle is to use *.tcl URLs or *.adp pages.

A file with a .tcl extension anywhere among the .html pages will be sourced by the Tcl interpreter. So you have URLs like "/bboard/fetch-msg.tcl". If asking for the page results in an error, you know exactly where in the Unix file system to look for the program. After editing the program and saving it in the file system, the next time a browser asks for "/bboard/fetch-msg.tcl" the new version is sourced. You get all of the software maintenance advantages of interpreted CGI scripts without the CGI overhead.

What the in-house AOL developers like the most are their .adp pages. These work like Microsoft Active Server Pages. A standard HTML document is a legal .adp page, but you can make it dynamic by adding little bits of Tcl code inside special tags.

AOLserver programmers also use Java and PHP.

Though AOLserver shines in the API department, its longest suit is its RDBMS connectivity. The server can hold open pools of connections to multiple relational database management systems. Your C or Tcl API program can ask for an already-open connection to an RDBMS. If none is available, the thread will wait until one is, then the AOLserver will hand your program the requested connection into which you can pipe SQL. This architecture improves Web/RDBMS throughput by at least a factor of ten over the standard CGI approach.

Source code for AOLserver is available from www.aolserver.com.

The most substantial shrink-wrapped software package specifically for AOLserver is OpenACS, available from www.openacs.org, a toolkit for building online communities.

Apache

Proceeding alphabetically, we arrive at the most popular Web server, Apache, which is running behind roughly half the world's Web sites. Apache seems to be used at the very simplest and the very most complex Web sites. The simple users just want a free server for static files. The complex users basically need a custom Web server but don't want to start programming from scratch.

The Apache API reflects this dimorphism. The simple users aren't expected to touch it. The complex users are expected to be C programming wizards. So the API is flexible but doesn't present a very high level substrate on which to build.

Support for Apache is as good as your wallet is deep. You download the software free from http://www.apache.org and then buy support separately from the person or company of your choice. Because the source code is freely available there are thousands of people worldwide who are at least somewhat familiar with Apache's innards.

There don't seem to be too many shrink-wrapped software packages for Apache that actually solve any business problem as defined by a business person. If you're a programmer, however, you'll find lots of freeware modules to make your life easier, including some popular ones that let you

embed a Perl or Python interpreter in the server and thus avoid CGI overhead
do user authentication from an RDBMS
add SSL support (encrypted sessions)
run Java Servlets
throttle back the usage on a per-user basis (very useful for ISPs)
run FastCGI scripts
add a server-parsed scripting language (ePerl or PHP)

See modules.apache.org for a current list.

Apache is a pre-forking server and is therefore reasonably fast.

Microsoft IIS/ASP

For people who want to use Microsoft Windows, Internet Information Server (IIS) is the natural choice. Active Server Pages (ASP) has the right architecture for Web service development. There is a reasonably gentle slope from plain-old-HTML to full-out custom program. ASP provides high-performance connectivity to relational databases.

Source code availability is a weak point for IIS. If it doesn't do what you want you're stuck. On the plus side, however, the API for IIS is comprehensive and is specially well suited to the Brave New World of Web Services (computers talking to other computers using HTTP and XML-format requests).

Virtually everyone selling shrink-wrapped software for Web services offers a version for IIS, including Microsoft itself with a whole bunch of packages.

IIS is one of the very fastest Web server programs available.

Connectivity

Sheep at the New Jersey State Fair 1995. Flemington, New Jersey.

A 64-processor computer with 64GB of RAM in your living room is an impressive personal Web server, but not if it has to talk to the rest of the Internet through a 56K modem. If you aren't going to park your server at a colocation facility you need to think about higher-speed connectivity to your premises.

Your First T1

If you want to join the club of Real Internet Studs, you need a T1 line at a minimum. This is typically a 1.5-Mbps dedicated connection to somebody's backbone network. You generally get the physical wire from the local telephone monopoly and they hook it up to the Internet service provider of your choice. The cost varies by region. In most parts of the U.S. you can get a full T1 for $600-800 per month.

For Web service, it is risky to rely on anyone other than a backbone network for T1 service. It is especially risky to route your T1 through a local ISP who in turn has T1 service from a backbone operator. Your local ISP may not manage their network competently. They may sell T1 service to 50 other companies and funnel you all up to Sprint through one T1 line.

You'll have to be serving more than 500,000 hits a day to max out a T1 line (or be serving 50 simultaneous real-time audio streams). When that happy day arrives, you can always get a 45-Mbps T3 line for about $15,000 a month.

If your office is in a city where a first-class colocation company does business, an intelligent way to use the T1 infrastructure is to co-locate your servers and then run a T1 line from your office to your co-location cage. For the price of just the "local loop" charge (under $500 per month), you'll have fast reliable access to your servers for development and administration. Your end-users will have fast, reliable, and redundant access to your servers through the colocation company's connections.

Cable Modems and ADSL

Overturned Golf. Entering the Wicklow Mountains south of Dublin, Ireland.

The cable network is almost ideal topologically for providing cheap Internet. There is one wire serving 100 or 200 houses. Upstream from this there is a tree of wires and video amplifiers. The cable company can say, "We declare the wire going to your house to be a Class C subnet." Then they put a cable modem in your house that takes Ethernet packets from the back of your computer and broadcasts them into an unused cable channel. Finally, the cable company just needs to put an Internet router upstream at every point where there is a video amp. These routers will pull the Internet packets out of the unused cable channels and send them into one of the usual Internet backbones.

If all of the folks on your block started running pornographic Web servers on their Linux boxes then you'd have a bit of a bandwidth crunch because you are sharing a 10-Mbit channel with 100 other houses. But there is no reason the cable company couldn't split off some fraction of those houses onto another Ethernet in another unused video channel.

Cable modems are cheap and typically offer reasonable upload speeds of 384 Kbps (remember that it is the upload speed that affects how well your cable modem connection will function for running a Web service). There are some limitations with cable modems that make it tough to run a Web server from one. The first is that the IP address is typically dynamically assigned via DHCP and may change from time to time. This can be addressed by signing up to dyndns.org and writing a few scripts to periodically update the records at dyndns.org with your server's IP address. When a user asks to visit "www.africancichlidclub.org" the dyndns.org translates that hostname to the very latest IP address that it has been supplied with. A limitation that is tougher to get around is that some cable companies block incoming requests to port 80, the standard HTTP service port. This may be an effort to enforce a service agreement that prohibits customers from running servers at home. It may be possible to get around this with a URL forwarding service that will bounce a request for "www.africancichlidclub.org" to "24.128.190.85:8000".

An alternative to cable is Digital Subscriber Line or "DSL". Telephony uses 1 percent of the bandwidth available in the twisted copper pair that runs from the central phone company office to your house ("the local loop"). ISDN uses about 10 percent of that bandwidth. With technology similar to that in a 28.8 modem, ADSL uses 100 percent of the local loop bandwidth, enough to deliver 6 Mbps to your home or business. There are only a handful of Web sites on today's Internet capable of maxing out an ADSL line. The price of ADSL is low because the service competes with cable modems, i.e., $20-50 per month. Telephone monopolies such as Verizon often impose restrictions on the use of the DSL line that make it impossible to run a home server. Look for competitive local exchange carriers (CLECs), smaller companies that provide DSL service from the same physical phone company office. An example of a CLEC is www.covad.com, which sells a fixed IP address for $70 per month (1.5 Mbps down and 384 Kbps up).

The Big Picture

"Sprint's ION integrates voice, video, and data over one line," said Charles Flackenstine, a manager of technology services at Sprint. "For small and medium businesses it leverages the playing field, giving them the capability to become a virtual corporation."
-- news.com (June 2, 1998)

Processing power per dollar has been growing exponentially since the 1950s. In 1980, a home computer was lucky to execute 50,000 instructions per second and a fast modem was 2,400 bps. In 1997, a home computer could do 200 million instructions per second but it communicated through a 28,800-bps modem. We got a 4,000-fold improvement in processing power but only a tenfold improvement in communication speed.

The price of bandwidth is going to start falling at an exponential rate. Our challenge as Web publishers is to figure out ways to use up all that bandwidth.

Chapter Summary

There are a bunch of layers that go into making a Web service work. You have to decide where your expertise lies and therefore which of these layers you ought to manage and which to outsource.
If you use Microsoft Windows on your desktop it probably makes sense to use Microsoft Windows for your Internet application server.
The most important factors in choosing a Web server program are the quality of its API, its ability to connect to a relational database management system, and whether or not the source code is available.
Bandwidth is cheap and getting rapidly cheaper, notably because of cable modems. Don't make too many decisions based on the assumption that users will be connecting at 56K.

To understand computer networks, start with TCP/IP Illustrated, Volume 1 (W. Richard Stevens 1994; Addison-Wesley), one of the best books on any engineering subject.
To understand processors and systems, read Computer Organization and Design (Patterson and Hennessy 1998; Morgan Kaufmann), Computer Architecture: A Quantitative Approach (Hennessy and Patterson 1996; Morgan Kaufmann), and Computer Architecture : Concepts and Evolution (Blaauw and Brooks 1997; Addison-Wesley). These are the standard textbooks in the field, written by people whose ideas have been built into every modern computer (e.g., Hennessy and Patterson were prime forces behind reduced instruction set computers (RISC)).
To understand Windows NT, read Inside Windows 2000 (David Solomon and Mark Russinovich 2000; Microsoft Press).
UNIX Power Tools (Peek, et al. 2002; O'Reilly) is a very useful introduction to using Unix.
http://www.cs.purdue.edu/coast/ and http://www.cert.org/ are good places to start learning about computer security. Dan Farmer, author of the SATAN security checker/cracker, found in 1996 that 60 percent of Internet servers could be easily compromised (see http://www.trouble.org/survey/).

or move on to Chapter 9: User Tracking

philg@mit.edu

Add a comment | Add a link