ArsDigita Server Architecture Auditing
by Jin Choi, part of
ArsDigita Server
Architecture
This documents an audit procedure for a single physical server running
the ArsDigita Server Architecture.
There are several systems involved in each installation:
- Unix
- Oracle
- AOLserver
- ArsDigita software
Unix
This is an independent audit from day-to-day monitoring via ArsDigita
Cassandrix as specified in the ArsDigita Server Architecture.
Check that sufficient disk space is available using "df -k". Any
partition close to 100% is bad, and should be brought back down at
once.
On arsdigita.com, there is a script
/usr/local/bin/check-on-stuff.pl
that runs a set of ad-hoc Unix checks. We may wish to put a variant of
this on all our current systems.
Backups
We do not currently have a universal system for doing backups. How
they are carried out depends on which version of Unix it is running,
what kind of tape drive it has, and whether it was set up before or
after TechSquare started taking care of backups. To find out if
backups are being carried out properly, it is necessary to su to root,
run crontab -l, pick out the line which looks like it handles system
backups, and check that file to see what software it uses, and what
(if any) logging is being carried out.
If there is currently a tape in the drive, you can check the result of
the backup directly. Most of our systems rely on a variant of "dump"
for backups. To check out a backup on tape, you must:
- Find out which device file represents the tape drive. On Solaris,
this is generally /dev/rmt/0n, on HPUX, /dev/rmt/0mnb; check the
backup script from crontab -l to see which one it uses. Set the TAPE
environment variable to this device.
- Use "mt" to rewind the tape: "mt rewind" on Solaris, "mt rew" on
HP.
- Use "restore" in interactive mode to poke around the drive:
restore if <tape-device>
"restore" might be ufsrestore (Solaris) or vxrestore (HP).
- Try restoring a file. Make sure you are in a scratch directory of
some sort, then use "add <filename>" to mark a files for
restoration and "extract" to recover all marked files. Marking a
directory will recursively recover the directory.
- To check any partition except for the first, you will need to use
"mt fsf" to fast forward the tape. Check the backup script to see what
order the partitions are dumped in.
Oracle
Make sure Oracle is running and that we aren't bumping up against
process limits. Try connect to Oracle using sqlplus.
Backups
Oracle backups are handled by doing consistent exports every
evening. The location and times of these dumps differ from machine to
machine. To find where the exports are going, run "crontab -l" as root
to find the script which does the exports. Make sure that script is
using the proper oracle system password. The latest export files in
the export directory should be timestamped no earlier than some time
the previous night. There should be sufficient disk space to store two
copies of the latest versions of the exports, so that the next exports
can happen (much more than two copies if they are compressed).
As these exports are done from root's cron, cron will send the results
of the export to root. If you would like to receive these nightly
mailings, add yourself as one of the recipients of the "root" mailing
account (if running qmail, add your email address to
/var/qmail/alias/.qmail-root).
Mail
Most of our newer systems run qmail as their mailer. If you want to
figure out how to route mail around using qmail, read this.
To make sure qmail is functioning properly:
AOLservers
Grep for nsd in /etc/inittab to see what servers are supposed to be
running. Make sure all of those servers are indeed running using "ps
-ef | grep .ini".
ArsDigita services
We have three monitoring services that run as their own process on
various servers: keepalive, rollover, and reporte.
- Keepalive needs to be checked to see that it's actually checking
the other servers by grepping for hits in the server access logs for
/SYSTEM/dbtest.tcl.
- Reporte needs to be checked to see that it's actually generating
reports for each day by visiting each server's reports and visually
verifying that they look like reasonable reports and none are missing.
- rollover: see below
Logs
Logs tend to grow without bound unless checked. Various logs to check
to make sure they aren't getting ludicrously big:
- AOLserver error logs: "ls -l /home/nsadmin/log/*-error.log". If
one of them is unusually large, make sure it is getting rolled
(generally by the rollover service, sometimes from the keepalive
service). To roll by hand, remove the file, then restart the aolserver
which generates it.
- Email logs: Usually /var/log/syslog. Might have been put somewhere
else; check /etc/syslog.conf. Sometimes we just turn it off entirely,
because it is so voluminous. To roll, remove the file, touch that file
to recreate it (might not be necessary), and kill -HUP the syslogd
process.
jsc@arsdigita.com
Add a comment | Add a link