Note: This content has been superseded to a large extent by four chapters of my new book.
IBM decided that this was too efficient for their new, fast, expensive mainframe computers. In the 1970s, they therefore invented relational database management systems (RDBMS) and the declarative query language SQL. A declarative language allows the user to say what he wants and then the RDBMS query planner figures out how to get it. The RDBMS is allowed to store the data however it wants.
It is much harder to have "a little bug" in an SQL query than in an arbitrary program. Generally it either describes the data that you want and works or it completely fails in an obvious way.
The main problem with RDBMSes is performance. If your query runs too slow, you have to try to figure out what this enormously complicated program is doing. You didn't write this program and you don't have the source code. You then have to figure out what different SQL incantations will get you the same set of data back but faster.
Of course, you could also buy a bigger mainframe... (or keep running the non-relational software you used in the 1960s, which is what the airlines do for their reservations systems)
If you have a big database of information that seldom changes, you may be much better off writing a program to extract it from the database and generate static HTML pages. You can repeat this process periodically and the net result is that you will consume 1% of the computer resources that you would have if you'd done the queries on-line. This is how Yahoo works, for example.
You need an RDBMS if you have data that changes frequently. Maybe you are allowing users to add themselves to a mailing list. Maybe you want to let 10 different people update a patient's medical record. It is possible to write perl, C, or Lisp programs that maintain this information in ordinary files but if you make a mistake in one of these programs, it is possible that attempted simultaneous updates will trash your database and/or that one update will be lost. Relational databases have concurrency control and make sure that all the updates happen without any tables getting corrupted.
For a traditional RDBMS-backed Web site, the RDBMS client is the Web server program (e.g., NaviServer 2.0) or maybe a CGI script that has been spawned in response to a user request for a URL. The user types something into a form on a Web client (e.g., Netscape Navigator) and that gets transmitted to the Web server (e.g., NCSA 1.4) which is itself or spawns an RDBMS client (e.g., perl script) which has or opens a connection to an RDBMS server (e.g., Oracle). The data then goes back from the RDBMS server to the RDBMS client to the Web server which sends it back to the Web client.
Does that sound complicated and slow? Well, yes it is, but it can be fast enough. I interfaced the 60 GB Boston Children's Hospital to the Web this way using the rudest of tools (Oracle 6, Oraperl, NCSA 1.3, ancient Sun SPARC). Try it out.
You could probably write all this into the API of a lot of servers. The problem is that servers tend to use C as an extension language. That means your software development cost will be prohibitive unless you shop the work out to Bangalore. Even if you are rich, if one of your C drones makes a typo, the little API extension will run wild in memory and bring down all of your Web services, not just break that one URL as in the case of a bad CGI script.
If this frightens you, you could just use NaviServer 2.0. Wizards Jim Davidson and Doug McKee have done all of the above work for you and it has been tested in hundreds of production sites since May 1995. I have personally used this software for several heavily used sites and feel that it is without question the best way to go for RDBMS-backed sites. You can write all of your code in C or Tcl and talk to the database. The Tcl runs inside the NaviServer's process and imposes no significant overhead (Tcl is a slug but that is negligible compared to the cost of going out to the RDBMS). [Note: I maintain a server of NaviServer example apps, complete with source code.]
The NaviServer Tcl API was introduced in early 1995. It remains for me the most efficient and programmer-friendly way of talking to an RDBMS. There are some competitors finally reaching the marketing, though, e.g., Netscape Livewire.
If you are building a richly interactive site and want the ultimate in user responsiveness after a painful and slow initial download, then Java is the way to go. You can write a Java applet that starts running inside the user's Web client and makes its own TCP connection back to the RDBMS, thus completely bypassing the Web server program. Of course, this won't work for people without Java-compatible Web browsers. Still, it is probably the way to go if you're doing an on-line chat system or something.
Of course, just because you are talking to the database in a reasonable manner doesn't mean that the database per se can handle the load.
Remember that backups and maintenance of an RDBMS are typically done by a full-time staffer at most companies. Be prepared for some days of pain installing and then periodic recurring pain if you want to be your own database administrator (dba). Of course, if you get good at it and this Web thing collapses, you can always get a job at a big company as a dba (Oracle dba's are always in high demand and start at $60-80,000/year).
Oracle is probably the best choice for most people, but the company is also famous for reaming its users and then holding out a bucket to catch the gushing money (how do you think Ellison paid for his NSX?). If you swagger in there and tell them you're going to have 100,000 users for your Web site running on your 16-headed DEC Alpha, they'll cheerfully look that up in their price book and hand you a bill for $1 million.
Actually, you may run out of energy before you even get a price quote. Oracle is waiting to see if this Internet fad catches on so you won't find prices on their Web site (which is wicked slow because they're waiting to see if this Netscape WIDTH and HEIGHT tag fad from 1994 is going to last). If you send email to their sales email address, they very helpfully email you back with the Oracle 800 number. I called it; a recording said that everyone was out being trained.
If you are using a single Web server to make all of your connections, then the RDBMS thinks that there is in fact only one user out there who happens to want to leave, say, 16 client programs open simultaneously. Then you might only need a minimum 5-user license which is under $16,000 for Oracle Enterprise (necessary if you want Context, the new text indexing product) or $1500 for Oracle Workgroup (basically the same program). If you write Java applets and let users connect from all over the Net, then each one counts as a separate user.
A nice thing about Oracle is that they rely on lawyers rather than C programs to manage licensing. They'll threaten to audit you and then harass you if you consistently have too many users connected, but they won't shut your Web server down because a buggy C program thinks you've exceeded your license (when you in fact haven't). [See Illustra, below.]
I've never personally seen anyone get useful assistance from Oracle support. Recent example: I reported a problem with a PL/SQL program that looked like it should work. A week later, I still hadn't heard back from anyone. So I telephoned and was told to try changing two characters in the 25-line example (I was using a PL/SQL table of records and they wanted me to switch from 0-based indexing to 1-based, something I'd in fact tried before asking for support the week before). I said "after a week, all you can tell me to do is make a change that you could have tested yourself in 30 seconds?" The change resulted in a different error message ("internal error") but still no useful computation. So I called back and was told that I eventually would be told whether this was in fact a bug. I never got a workaround. I never got an answer.
Here are the things I like about OWS 2.0:
htp.line
does? What about
<HR>
? The string interpolation feature (where you
can stick variable names inside of static strings) of Tcl and perl
makes for much more readable scripts in many cases.
Free code: If you want to see an example of an OWS 2.0 application, check out my Bulletin Board (threaded discussion group) demo. It is available as both NaviServer Tcl for Illustra and as OWS 2.0 PL/SQL.
Oracle, of course, believes that their proprietary Web servers are the future. Just check the "Server:" header below to gauge the depth of their commitment...
(Sometime late in the summer of 1996, Oracle could no longer stand the humiliation being inflicted on them by suck.com and articles like this. So they are actually using their own software now.)philg-sub-47> telnet www.oracle.com 80 Trying... Connected to www-2.us.oracle.com. Escape character is '^]'. HEAD / HTTP/1.0 HTTP/1.0 200 Document follows Date: Wed, 06 Mar 1996 22:37:51 GMT Server: NCSA/1.5 Content-type: text/html
Let's hope it is more reliable than Word and Excel...
Illustra
I've had more experience with Illustra than with any other
product and therefore more pain. I give away all of my tips for using this system.
The database has some type extension capability, which they called
"object orientation" so that it sells better to MBAs. If what you
really need is an object database, you'll find that ObjectStore and similar products are
literally about 1000 times faster. However, having higher level
types, e.g., images, can make it possible to do in SQL what you
formerly would have been forced to do with an SQL query and then
grabbing something out of the Unix file system. If you feel like
writing a bunch of C code and having your bugs crash the database
server, then you can really extend the Illustra system by adding your
own radically new types.
As far as I'm concerned, though, Illustra has really never understood the fundamental advantages of object-oriented programming that go back to the 1970s. That is, you can write a complex system in modular chunks with advertised interfaces and then the next programmer who comes along doesn't have to start from scratch. He can simply write a bit of new code that overrides and/or specializes the behavior of the existing modules. This isn't the case in Illustra and because what they built didn't adequately anticipate my needs as a Web publisher, I've ended up having to use their system exactly as I'd use Sybase or Oracle. [see my tips page]
What's the problem with that? Well, Illustra 3.2 really isn't as good an Oracle as Oracle 7. Oracle and Sybase can do at least 30 inserts and updates/second on a generic Unix box. Illustra is down around 10. Illustra SELECTs can be quite fast, but sometimes their query planner and caching algorithms will hose you; your user ends up waiting minutes for a query that would have executed in seconds on an Oracle.
I've found Illustra support to be excellent. They've gone the extra mile for me even when the problem was simply my lack of SQL ability. Unfortunately, their development staff hasn't been very responsive. They seem sure enough that they've built the ultimate "database for cyberspace" that they don't need to hear suggestions from actual Web site developers.
Informix bought Illustra in December 1995. They plan to ultimately merge the Illustra and Informix systems into a "Universal Server" (which is, confusingly, what Oracle calls its currently shipping release (Oracel 7.3)). This is currently vaporware, but one tangible change from the Informix merger is that the price of Illustra for use on the Web has gone way up.
A SQL Database->Web page has some useful links.
Jean Anderson wrote a cute interactive SQL tutorial on the Web (built with NaviServer/Illustra).
If you want to get that old time religion, Oracle Magazine has a lot of practical articles and demonstrates that Oracle might actually understand that Web some day because they removed their evil registration requirement.
The Software AG (http://www.softwareag.com/) in Germany has also a very impressive relational Database system called ADABAS. It is available on all main operating systems (including Linux). The database is mainly used in conjunction with the SAP system. Integration is available for C/C++, TCL, Perl and PHP/FI.
-- Frank Tegtmeyer, October 12, 1997