Transactionspart of SQL for Web Nerds by Philip Greenspun |
Generally, this is not how it is done. As a programmer, you write code that gets executed every time a user submits a discussion forum posting or classified ad. The structure of the SQL statement remains fixed but not the string literals after theinsert into mailing_list (name, email) values ('Philip Greenspun','philg@mit.edu');
values
.
The simplest and most direct interface to a relational database involves a procedural program in C, Java, Lisp, Perl, or Tcl putting together a string of SQL that is then sent to to the RDBMS. Here's how the ArsDigita Community System constructs a new entry in the clickthrough log:
The INSERT statement adds one row, filling in the four list columns. Two of the values come from local variables set within the Web server,insert into clickthrough_log (local_url, foreign_url, entry_date, click_count) values ('$local_url', '$foreign_url', trunc(sysdate), 1)"
$local_url
and $foreign_url
. Because these
are strings, they must be surrounded by single quotes. One of the
values is dynamic and comes straight from Oracle:
trunc(sysdate)
. Recall that the date
data
type in Oracle is precise to the second. We only want one of these rows
per day of the year and hence truncate the date to midnight. Finally,
as this is the first clickthrough of the day, we insert a constant value
of 1 for click_count
.
(delete clickthrough records more than 120 days old) and that 3500 rows indelete from clickthrough_log where entry_date + 120 < sysdate;
clickthrough_log
are older than 120 days. If your
computer failed halfway through the execution of this DELETE, i.e.,
before the transaction committed, you would find that none of the rows
had been deleted. Either all 3500 rows will disappear or none will.
More interestingly, you can wrap a transaction around multiple SQL statements. For example, when a user is editing a comment, the ArsDigita Community System keeps a record of what was there before:
This is generally referred to in the database industry as auditing. The database itself is used to keep track of what has been changed and by whom.ns_db dml $db "begin transaction" # insert into the audit table ns_db dml $db "insert into general_comments_audit (comment_id, user_id, ip_address, audit_entry_time, modified_date, content) select comment_id, user_id, '[ns_conn peeraddr]', sysdate, modified_date, content from general_comments where comment_id = $comment_id" # change the publicly viewable table ns_db dml $db "update general_comments set content = '$QQcontent', html_p = '$html_p' where comment_id = $comment_id" # commit the transaction ns_db dml $db "end transaction"
Let's look at these sections piece by piece. We're looking at a Tcl
program calling AOLserver API procedures when it wants to talk to
Oracle. We've configured the system to reverse the normal Oracle world
order in which everything is within a transaction unless otherwise
committed. The begin transaction
and end
transaction
statements never get through to Oracle; they are
merely instructions to our Oracle driver to flip Oracle out and then
back into autocommit mode.
The transaction wrapper is imposed around two SQL statements. The first
statement inserts a row into general_comments_audit
. We
could simply query the general_comments
table from Tcl and
then use the returned data to create a standard-looking INSERT.
However, if what you're actually doing is moving data from one place
within the RDBMS to another, it is extremely bad taste to drag it all
the way out to an application program and then stuff it back in. Much
better to use the "INSERT ... SELECT" form.
Note that two of the columns we're querying from
general_comments
don't exist in the table:
sysdate
and '[ns_conn peeraddr]'
. It is legal
in SQL to put function calls or constants in your select list, just
as you saw at the beginning of
the Queries chapter where we discussed Oracle's
one-row system table: dual
. To refresh your memory:
You can compute multiple values in a single query:select sysdate from dual; SYSDATE ---------- 1999-01-14
This approach is useful in the transaction above, where we combine information from a table with constants and function calls. Here's a simpler example:select sysdate, 2+2, atan2(0, -1) from dual; SYSDATE 2+2 ATAN2(0,-1) ---------- ---------- ----------- 1999-01-14 4 3.14159265
Let's get back to our comment editing transaction and look at the basic structure:select posting_time, 2+2 from bboard where msg_id = '000KWj'; POSTING_TI 2+2 ---------- ---------- 1998-12-13 4
Oracle has stopped us from deleting user 39685 because to do so would leave the database in an inconsistent state. Here's the definition of the bboard table:select user_id from bboard where msg_id = '000KWj'; USER_ID ---------- 39685 delete from users where user_id = 39685; * ERROR at line 1: ORA-02292: integrity constraint (PHOTONET.SYS_C001526) violated - child record found
Thecreate table bboard ( msg_id char(6) not null primary key, refers_to char(6), ... user_id integer not null references users, one_line varchar(700), message clob, ... );
user_id
column is constrained to be not null.
Furthermore, the value in this column must correspond to some row in the
users
table (references users
). By asking
Oracle to delete the author of msg_id 000KWj from the users
table before we deleted all of his or her postings from the
bboard
table, we were asking Oracle to leave the RDBMS in
an inconsistent state.
The /bboard module of the ArsDigita Community System has to do this. The sequence is
msg_id_generator
table
last_msg_id
column
in the msg_id_generator
table
bboard
table (from /bboard/insert-msg.tcl):
select last_msg_id from msg_id_generator for update of last_msg_id
for update
clause isn't a panacea. For example, in the
Action Network (described in Chapter 16 of Philip and Alex's
Guide to Web Publishing), we need to make sure that a
double-clicking user doesn't generate duplicate FAXes to politicians.
The test to see if the user has already responded is
By default, Oracle locks one row at a time and doesn't want you to throw a FOR UPDATE clause into a SELECT COUNT(*). The implication of that would be Oracle recording locks on every row in the table. Much more efficient is simply to start the transaction withselect count(*) from an_alert_log where member_id = $member_id and entry_type = 'sent_response' and alert_id = $alert_id
This is a big hammer and you don't want to hold a table lock for more than an instant. So the structure of a page that gets a table lock should belock table an_alert_log in exclusive mode
You might think you could use thecreate table news ( title varchar(100) not null, body varchar(4000) not null, release_date date not null, ... );
title
column as a key,
but consider the following articles:
It would seem that, at least as far as headlines are concerned, little of what is reported is truly new. Could we addinsert into news (title, body, release_date) values ('French Air Traffic Controllers Strike', 'A walkout today by controllers left travelers stranded..', '1995-12-14'); insert into news (title, body, release_date) values ('French Air Traffic Controllers Strike', 'Passengers at Orly faced 400 canceled flights ...', '1997-05-01'); insert into news (title, body, release_date) values ('Bill Clinton Beats the Rap', 'Only 55 senators were convinced that President Clinton obstructed justice ...', '1999-02-12'); insert into news (title, body, release_date) values ('Bill Clinton Beats the Rap', 'The sexual harassment suit by Paula Jones was dismissed ...', '1998-12-02);
at the end of our table definition? Absolutely. But keying by title and date would result in some unwieldy URLs for editing or approving news articles. If your site allows public suggestions, you might find submissions from multiple users colliding. If you accept comments on news articles, a standard feature of the ArsDigita Community System, each comment must reference a news article. You'd have to be sure to update both the comments table and the news table if you needed to correct a typo in theprimary key (title, release_date)
title
column or changed the
release_date
.
The traditional database design that gets around all of these problems is the use of a generated key. If you've been annoyed by having to carry around your student ID at MIT or your patient ID at a hospital, now you understand the reason why: the programmers are using generated keys and making their lives a bit easier by exposing this part of their software's innards.
Here's how the news module of the ArsDigita Community System works, in an excerpt from http://software.arsdigita.com/www/doc/sql/news.sql:
We're taking advantage of the nonstandard but very useful Oracle sequence facility. In almost any Oracle SQL statement, you can ask for a sequence's current value or next value.create sequence news_id_sequence start with 1; create table news ( news_id integer primary key, title varchar(100) not null, body varchar(4000) not null, release_date date not null, ... );
Oops! Looks like we can't ask for the current value until we've asked for at least one key in our current session with Oracle.SQL> create sequence foo_sequence; Sequence created. SQL> select foo_sequence.currval from dual; ERROR at line 1: ORA-08002: sequence FOO_SEQUENCE.CURRVAL is not yet defined in this session
You can use the sequence generator directly in an insert, e.g.,SQL> select foo_sequence.nextval from dual; NEXTVAL ---------- 1 SQL> select foo_sequence.nextval from dual; NEXTVAL ---------- 2 SQL> select foo_sequence.nextval from dual; NEXTVAL ---------- 3 SQL> select foo_sequence.currval from dual; CURRVAL ---------- 3
Background on this story: http://philip.greenspun.com/school/tuition-free-mit.htmlinsert into news (news_id, title, body, release_date) values (news_id_sequence.nextval, 'Tuition Refund at MIT', 'Administrators were shocked and horrified ...', '1998-03-12);
In the ArsDigita Community System implementation, the
news_id
is actually generated in /news/post-new-2.tcl:
This way the page that actually does the database insert, /news/post-new-3.tcl, can be sure when the user has inadvertently hit submit twice:set news_id [database_to_tcl_string $db "select news_id_sequence.nextval from dual"]
In our experience, the standard technique of generating the key at the same time as the insert leads to a lot of duplicate information in the database.if [catch { ns_db dml $db "insert into news (news_id, title, body, html_p, approved_p, release_date, expiration_date, creation_date, creation_user, creation_ip_address) values ($news_id, '$QQtitle', '$QQbody', '$html_p', '$approved_p', '$release_date', '$expiration_date', sysdate, $user_id, '$creation_ip_address')" } errmsg] { # insert failed; let's see if it was because of duplicate submission if { [database_to_tcl_string $db "select count(*) from news where news_id = $news_id"] == 0 } { # some error other than dupe submission ad_return_error "Insert Failed" "The database ..." return } # we don't bother to handle the cases where there is a dupe submission # because the user should be thanked or redirected anyway }
If you ask for a few nextvals and roll back your transaction, the sequence will not be rolled back.
You can't rely on sequence values to be, uh, sequential. They will be unique. They will be monotonically increasing. But there might be gaps. The gaps arise because Oracle pulls, by default, 20 sequence values into memory and records those values as used on disk. This makes nextval very fast since the new value need only be marked use in RAM and not on disk. But suppose that someone pulls the plug on your database server after only two sequence values have been handed out. If your database administrator and system administrator are working well together, the computer will come back to life running Oracle. But there will be a gap of 18 values in the sequence (e.g., from 2023 to 2041). That's because Oracle recorded 20 values used on disk and only handed out 2.