Server Clustering

ACS Documentation : ACS Administrator's Guide : Server Clustering

Tcl: /tcl/ad-server-cluster.tcl

The Problem

Many heavily-hit sites sit behind load balancers, which means that requests to a particular site can be handled by one of several machine conspiring to appear as a single server. For instance, requests to www.foobar.com might be routed to either www1.foobar.com, www2.foobar.com, or www3.foobar.com, three physically separate servers which share an Oracle tablespace (and hence all the data in ACS).

Many database queries are memoized in individual servers' local memory (using the util_memoize procedures) to minimize fetches from the database. When a server updates an item in the database, the old item needs to be removed from the server's local cache (using util_memoize_flush) to force a database query the next time this item is accessed. But what happens when:

www1.foobar.com does util_memoize "get_greeble_info 43" (incurring an actual database lookup, SELECT * FROM greeble WHERE greeble_id = 43, and caching the result)
www2.foobar.com does util_memoize "get_greeble_info 43" (incurring a database lookup and caching the result)
www1.foobar.com UPDATEs the info for greeble #43 and does util_memoize_flush "get_greeble_info 43"
www2.foobar.com does util_memoize "get_greeble_info 43" (returned a cached value). The old info for greeble #43 hasn't been flushed from its local cache, so the result is outdated!

In general, if any of several servers can update an item, the old version of the item can remain in other servers' local caches. Doh!

The Solution

We introduce the concept of a server cluster, a group of look-alike servers sharing an Oracle tablespace. To set up a cluster, add the following to the ACS parameters/yourservername.ini file on each of the servers in the cluster:

; address information for a cluster of load-balanced servers (to enable
; distributed util_memoize_flushing, for instance).
[ns/server/yourservername/acs/server-cluster]
; is clustering enabled?
ClusterEnabledP=1
; which machines can issues requests (e.g., flushing) to the cluster?
ClusterAuthorizedIP=192.168.16.*
; which servers are in the cluster? This server's IP may be included too
ClusterPeerIP=192.168.16.1
ClusterPeerIP=192.168.16.2
ClusterPeerIP=192.168.16.3
; N.B.: www1 = 192.168.16.1, www2 = 192.168.16.2, www3 = 192.168.16.3
; log clustering events?
EnableLoggingP=1

(Of course, you'll want to replace the IP addresses with the actual IPs of the hosts in the cluster.)

Now when a server (say, www1.foobar.com) invokes util_memoize_flush or util_memoize_seed, those routines use server_cluster_httpget_from_peers to issue an HTTP GET request to all machines in the cluster (omitting the local server):

GET http://www2.foobar.com/SYSTEM/flush-memoized-statement.tcl?statement=tcl-statement
GET http://www3.foobar.com/SYSTEM/flush-memoized-statement.tcl?statement=tcl-statement

causing the other machines (www2.foobar.com and www3.foobar.com) to flush the Tcl statement from their local caches. This is transparent and works with all existing code.

So don't think about it - just set up the server-cluster block in your yourservername.ini file, and util_memoize and friends will be happy.

jsalz@mit.edu