Community File Storage System
part of the ArsDigita Community System
by David Hill and Aurelius Prochazka
The big picture
Suppose that a bunch of people need to collaboratively maintain a set of
documents. These documents need to be organized in some way but you
don't want to require the contributors to learn HTML or filter all
emplacements of files through a Webmaster.
If you simply give everyone FTP access to a Web-accessible directory,
you are running some big security risks. FTP is insecure and passwords
are transmitted in the clear. A cracker might sniff a password, upload
.pl, , and .adp pages, then grab those URLs from a Web browser. The
cracker is now executing arbitrary code on your server with all the
privileges that you've given your Web server.
This system allows users to save their files on our server so that they
may:
- Organize files in a hierarchical directory structure
- Upload using Web forms, using the file-upload feature of Web
browsers (potentially SSL-encrypted)
- Grab files that are served bit-for-bit by the server, without
any risk that a cracker-uploaded file will be executed as code
- Retrieve historical versions of a file
Parameters
; for the ACS File-Storage System
[ns/server/yourserver/acs/fs]
SystemName=File Storage System
SystemOwner=file-administrator@yourserver.com
DefaultPrivacyP=f
; do you want to maintain a public tree for site wide documents
PublicDocumentTreeP=1
MaxNumberOfBytes=2000000
DatePicture=MM/DD/YY HH24:MI
HeaderColor=#cccccc
FileInfoDisplayFontTag=<font face=arial,helvetica size=-1>
UseIntermediaP=0
Details
The file-storage system is built around a data model consisting of two
tables, one for files and a second for versions. A folder is treated as
a type of file. Files are owned by a single user, but may contain
versions created by authors other than the owner.
Permissions were only given to files and not folders in order to
simplify both the code and the user interface i.e. to avoid questions
like "Why can't any of the people in my group see my files?" answered by
"Did you notice that someone changed the permissions of the parent of
the parent of the parent folder of this file?" However, the system is easy
to extend to allow folders to have thier own permissions.
The permissions are handled by the general permissions system.
No file or version can be deleted from the database, except by an
administrator. Instead, the file is deleted by setting the deleted_p
flag.
This system supports site-wide, group and individual user document trees.
Full-text Indexing
If you're running Oracle 8i (8.1.5 or later), you might want to build an
Intermedia text index (ConText) on the contents of file versions.
Intermedia incorporates very smart filtering software so that it can
grab the text from within HTML, PDF, Word, Excel, etc. documents. It is
also smart enough to ignore JPEGs and other pure binary formats.
Steps to using Intermedia:
- install Intermedia (Oracle dbadmin hell)
- get Intermedia's optional "INSO filtering" system to work. Here's
what jsc@arsdigita.com had to say about his experience doing this...
I got the INSO stuff working. The major holdup was that you have to
configure listener.ora to have $ORACLE_HOME/ctx/lib in
LD_LIBRARY_PATH. The docs mumble something about editing listener.ora,
but a careful perusal of anything having to do with networking setup
didn't turn up any examples. The networking assistant program has a
field for "Environment", but when you try to put anything in there, the
program hits a null pointer exception when you go to save it and doesn't
write anything. I "solved" this eventually by just symlinking all the
.so files in ctx/lib into $ORACLE_HOME/lib, which is already in the
LD_LIBRARY_PATH for the listener.
- In order to have the interMedia index synchronized whenever
documents get added or updated, the index must be synchronized (using
alter index indexname rebuild online parameters
('sync')
), or the ctxsrv process must be run, which updates all
interMedia indices periodically (ctxsrv -user
ctxsys/ctxpassword
). If using ctxsrv, the shell which starts it
must have $ORACLE_HOME/ctx/lib
as part of LD_LIBRARY_PATH.
- uncomment the
create index fs_versions_content_idx
statement in file-storage.sql (and then feed it to Oracle)
- set
UseIntermediaP=1
in your ad.ini file
- restart AOLserver (so that it reads the new parameter setting)
Warning: Intermedia is a tricky product for users. The default mode is
exact phrase matching, which means that the more a user types the fewer
search results will be returned (a violation of the user interface
guidelines in developers). So you
might be letting yourself in for some education of users...
Future Improvements
- Currently the administration section needs considerable work. Instead of trying to clean /admin/file-storage/ up, we should build a better /file-storage/admin or even allow administrators to do more within /file-storage/.
- Ticket Tracker style column sorting. We want the ability to sort the contents of each folder by name, author, size, type and last modified. In addition, the folders should be able to sort among themselves by name. You should use something very similar to the procedure ad_table. The procedure that you use will be slightly different because the files will be sorted on a per folder basis instead of on a per table basis.
- Better organization of the folder tree - Make the interface more
of a Window's style interface. Add a + type icon next to the folder
if the folder is open and all of the files in the folder can be seen.
Add a - icon when the folder is closed and can be expanded. Clicking
on the + sends the user back to the same page with the contents of the
folder to be hidden and the - icon in place of the +. Clicking on the -
sends the user back to the same page causing a + to replace the - and
all of the files in the folder to be shown. Clicking on the folder
icon or name should act just as they do now.
- Nifty javascript version
aure@arsdigita.com