The Symptomatic Filesystem is a new design for a large scale distributed network file service. The name is chosen to reflect the fact that the physical location of any given file in the service is unknown at all times, and that users only interact with symptoms of files, rendering the files themselves safe both from malicious attack and network congestion.
SFS is motivated by a number of scenarios, none of which modern network filesystems can adequately handle:
The Government of Evilland is very evil, and Joe Average (an Evilland expatriate) wants the whole world to know it. Joe publishes a "web page" or similar resource outlining the evil things that the Government of Evilland does. The Government of Evilland promptly sets their Evil Internet Goon Squad on the task of finding and eliminating the network server serving this page. They do so my any number of known attack patterns -- none at all sophisticated. They attempt to flood the TCP stack, take down nearby routers, send harassing emails to the ISP, etc. etc. until the page goes away.
Something amazingly cool is published on the internet, and within minutes news of its existence has multiplied through the geometry of email forwarding and IRC to literally millions of interested onlookers. Each of them loads the resource, even if just to see what all the fuss is about. The site the resource is published on is completely swamped, and falls over from the strain of trying to satisfy all the requests. Nobody gets to see it at all.
You implement a security system which relies on the availability of public or private crypto keys. You encrypt everything, and as soon as you finish, the one disk which is holding the crypto keys crashes. You're sunk.
A very smart person, known and loved to many a computer user, wanders the globe sleeping on various peoples' couches and taking leftovers from their fridge in exchange for little gems of advice which come from the twisted recesses of her genius mind. She is not affiliated with any institution or even a reliable ISP, since she has no permanent address for billing or anything. Nonetheless she is regularly in front of a terminal, wanting to check her mail, edit her files, have emacs start up just the right way, etc. She does not trust any one person to store her files, as they may suddenly become unavailable, so she makes do with floppy disks, rsync, and a variety of obscure FTP servers she knows nobody will look at.
For SFS to be successful, it must address these motivating cases in a satisfactory manner.
In its idealized state, the SFS model is an array of exactly 2^128 files (figure 1), each of which may be as large or as small as desired. The files, which have unique locations in the array, also posess unique names. No 2 files can have the same name, but aside from that names may be arbitrary binary strings of any sort. Files can only be accessed by name. There is no intrinsic browsing, indexing or hierarchical directory structure implied on top of SFS -- that is left to a higher level protocol.
Figure 1: the simplified SFS
...
6eb9 adb8 4e14 cdf2
a51f 325c 4730 2979
"Fried Chicken"
6eb9 adb8 4e14 cdf2
a51f 325c 4730 297A
"David and Goliath"
6eb9 adb8 4e14 cdf2
a51f 325c 4730 297B
"Linux Beer Steins"
...
The array is distributed amongst a set of server hosts. Each server holds some, but not all, of the files. A user process interacting with the servers need only find one server in the set to locate any file in the SFS array by name. There can be many distinct, disconnected SFS arrays built (for instance, a single organization may want one internally) as SFS is self organizing and does not depend on any external authority or administrative structure like DNS, IP numbers, ASNs, etc. The mechanisms for accessing files given their name and distributing files between hosts are the remaining interesting details of this paper.
As stated above, file names are free form binary strings, as are
their contents. Names are stored along with files at their
locations in SFS (to uniquely determine files), and files can be
protected by cryptographic features like public keys, such that
they can only be altered by clients posessing proper
credentials. The numerical location for a file is arrived at by
taking a secure cryptographic hash function such as
SHA-1
and applying it to the file name. The result
will be a 128-bit integer, which is the file's index in the
SFS. The likelihood of hash value collision is minor, besides
which fact multiple files can be stored at any location given
sufficient storage on the host, so long as they have different
names.
If we let the function char *sha(char *c)
be an
implementation of the SHA-1 hash algorithm, and consider the
function's iterates starting from a given name nm
,
we get a well defined infinite sequence of 128-bit integers
(figure 2).
Figure 2: Iterated Hashing
sha(nm)
6eb9adb84e14cdf2a51f325c47302979
sha(sha(nm))
ea1673a5cfac98bb31a116b104163db3
sha(sha(sha(nm)))
46f92615dc993376dd6db7ea354c7b49
sha( ... sha(nm) ... )
some other integer here
We will call this sequence of integers the Hash Chain of nm. It is extremely unlikely that any two files published by the same individual will have the same hash chain, even if their names differ only by a single bit. Cryptographic hash algorithms are designed to come up with very different output given even slightly different input.
The purpose of computing a hash chain in the first place is that it gives not only the primary or "home" address of a file in SFS, but also a sequence of "backup" hosts which a client process may wish to query if the primary is unreachable. In fact, as we shall see, failure of any given server in the hash chain is of little consequence to the overall likelihood of retrieval.
The motivating cases above are difficult to address primarily because any illigitimate user wishing to disrupt file service need only masquerade as a legitimate user long enough to locate the server with a file they dislike, and then commence an attack on that server (uniquely determined by IP address). SFS solves this scenario in an unorthodox way: rather than trying to resist attack, when a file server S even suspects it is under any form of attack it immediately falls over, dead as a doornail. The only thing S needs to ensure is that its files have been safely mirrored elsewhere before it cuts off service. While this mirroring activity could in principle be driven by the attack itself, things are much simpler and more secure if all file servers simply follow an algorithmic rule of thumb: In order for S to even begin providing file service for a given file named N, S must first secure a backup location somewhere along the hash chain for N, and only once the backup acknowledges that it has a copy can S begin to service requests.
This simple rule, plus a little careful thought about the sequencing of communication, puts the attacker in a very difficult position. Before commencing any given attack (which takes at least some resources), they must learn of the file's location. But in order to learn of a file's location, they must receive a positive reply from some server in the hash chain, at which point they know the file will already have been backed up to some unknown position, elsewhere in the chain. The only reliable form of attack involves hitting the entire hash chain, which (since it is theoretically infinite and sparsely distributed throughout the address space) becomes increasingly difficult as the size of the set of file servers grows. Thus, any attack (or failure, which is modeled as attack) is always slightly too late to cause any damage to the overall system
In order to best capitalize on the concept of hash chains, the file retrieval protocol is closely connected to the chains themselves. A request for a file is sent to any host (chosen at random) in the hash chain, which forwards the request "up" the chain towards the first hash iterate, host by host. If any host in the chain is unavailable, it is skipped, and the message winds its way upwards. When the message reaches a host which is willing to satisty the request, the response is bundled in another message and sent back down the chain addressed to the member who initiated the request. That host may cache the response if the request was read-only and a special "loose security" flag is set on the file, but (like all servers) the caching host must ensure that it has a backup before it actually serves any further requests. It then becomes a "mirror" for the original file, which helps balance the load. Requests which enter the chain "below" the caching host will be served from its cache; requests which enter the chain "above" it will be served by the initial image.
If someone has attacked a server which was serving the requested file, the request will make it to the head of the hash chain (the lowest-numbered iterate which is still online), be rewritten as a failure message, and work its way back down the chain passing through all the hosts inbetween. In this case, one of two things will happen:
The failure message will have passed by a host on the chain which was a backup for the original file. This "backup host" now knows that someone or some act of nature has crashed the "primary host". It will then (sometime in the near future) quietly make its own backup somewhere else in the chain. It will do nothing else until it hears another request. It has now assumed primary service for the file.
The failure message did not pass such a backup host. Nothing happens.
Assume for a moment that the idealized SFS of figure 1 is already distributed in some uniformly random way amongst a large number of hosts on the internet. The numeric file addresses served by each host do not correspond in any predictable way to the host's IP address; thus there is clearly a problem of even finding the IP address of any host in a given hash chain. SFS solves this problem by ensuring that each host in the network have a special kind of map of the 128-bit SFS address space; a client only needs to find one host and make use of the host's map in order to find all other hosts in the network.
Since the address space of SFS is quite large, it is infeasable
to store all 2^128 addresses in a flat file, or even to store a
list of all active servers in a given network in a flat
file. However, we would like SFS to be self-organizing and not
have central failure points, so a tree-structured addressing
system like IP or the DNS is not a viable option. What we choose
instead is a model in which each server has a "home address"
which is the center of the numerical range it is serving, and it maps out the space
numerically close by; the map becomes more and more
sparse and inaccurate as the numerical distance between the
mapping host and the mapped host increases. The implementation
is mathematically quite simple: each host stores an array with
exactly 255 slots in it (figure 3). Each slot S holds the
IP address of some host who serves an SFS address A with
((int) log(A-H)) == S
, where H is the home
address of the mapping host, and the logarithm is taken
in base 2. We refer to this construct as a log map.
Figure 3: A Log Map
127
sfs.snark-hunter.com
...
1
sfs.chicken-cow.net
0
localhost
-1
sfs.bazooka.com
...
-127
sfs.flimsy.edu
If all hosts in the SFS have log maps, then a client can
perform a reasonably efficient binary search of the address
space. The client simply asks any nearby server for a file with
address A, and (assuming the server does not belong to the
file's hash chain, which would be remarkably lucky) the server
takes log(A-H)
, truncates it to an integer I,
jumps to the I'th entry in its log map, and forwards the request
to that host. Thus even in the worst case where each file is on a
separate server and there are 2^128 servers, it takes at most 128
network hops to find the a server in the correct hash chain. In
practise, it the search should terminate much more rapidly.
For the purposes of navigation and numerical distribution, the
SFS address space should be considered circular; that
is, for any address A, we have that A + 2^128 == A - 2^128 ==
A
.
A network of servers (and the associated maps) is not a static set of data. Servers will join and leave the network. Failure will be common. Relocation will be common. These sorts of things have to be easily accomodated by the SFS protocol.
A set of SFS servers begins with a single
server A, whose map has "home address" H and has 128
entries, all pointing back to itself. It is the only server in the
network. This server can be solicited by any other new
server B which wants to join the network. A then inserts
B's IP address (or DNS name) into slot 127 of its map. This
effectively cuts the address space served by A in half. A then
assigns B an address randomly chosen between (2^127)-H
and 2^127
as "home address", and transmits its map
(consisting of all-A entries) to B. B then inserts H in it's map
at the natural (-127th) slot. This completes a "split" of the
address space. When A next receives a solicitation, it will assign
the new server an address from the slot (- 2^127)-H
,
then (2^126)-H
, (- 2^126)-H
,
(2^125)-H
, etc. until it has split off every entry in
the map.
Each time it assigns an address to a new server, A transmits the entire 255-entry map to the new host. Since the "perspective" of the map changes between one host and another, many entries in the transmitted map will have the same truncated logarithmic offset when viewed from the new host's home address. Only the first entry in a given slot in the new host's map is kept; additional entries are discarded. Likewise, A will not assign more than one new address from a given slot. Once a server has assigned a non-local entry to a slot, the slot is consumed and not available for further splitting. If A receives further solicitations it will pass them on to hosts in its map. All servers follow the same protocol for splitting
Since it is possible as the space becomes more densely populated that 2 different servers may attempt to hand out the same address, a server wishing to assign an address to a soliciting host must query the address and confirm that no server currently considers it a "home address", or is serving a file with that exact numerical address. So long as the chosen number is "free" in this respect, it can be arbitrarily assigned.
While temporary failures of individual hosts may be recovered from reasonably well by the redundancy inherent in the hash chains, it is possible that a host with a large quantity of the address space may simply drop off the network. Careful observation of the above splitting protocol will note, however, that such a server will eventually have all of its addresses re-assigned at random by its neighbours, as it will not be online to stake a claim on any such addresses. It may be desirable, in addition, to have a neighbour take over service for a failed server; this special type of solicitation is a simple variation on the original randomized solicitation, and allows for easy recovery of a large set of addresses, if space is short.
SFS is a novel design for a simplified, distributed filesystem with high reliability and security. It is not necessarily the fastest or most easily administered filesystem (from the point of view of conventional, centralized control of resources), but it may serve a useful role in many wide area applications such as publishing, file sharing, archival, and multi-user collaboration. I would like to see a sort of volunteer "distributed.net of filesystems" arise from people's spare hard disk space, for instance. Whether SFS scales down to the size of a single workgroup or cluster remains to be seen in implementation.
The design is mostly my own work (Graydon Hoare) but I received useful insight from Justin Wells and Laurie Harper, and some inspiration from work by Ian Goldberg. It stems from an earlier idea I'd hoped to implement for key management, which is filed away as an internet draft. Adam Back helpfully pointed me towards this proposal from Ted Anderson, who clearly has more of a clue than I do. I also discussed this sort of stuff with Ian Clarke in email for a while, but he seemed totally uninterested in making a filesystem (with things like the ability to update data, or even ensure it stays online if it's not popular); nonetheless it seems he's made enough people keen on "p2p" systems that it doesn't really matter which cryptosystem you use underneath anymore.
No implementation of this filesystem currently exists. It is a concept only. As such, well, I don't know if it's meaningful to copyright something when I really mean to be patenting it, but I want it to be clear that at least this document is copyright (C) 1999 Graydon Hoare, and licensed to all parties under the terms of the GNU GPL v 2.0+. I don't know if that really holds any water in court, but if you implement this, patent it, and then try to sue me when I implement it, I will never ever come to your birthday party. You're also more than welcome to implement it under GPL.
I'm hoping to do an implementation of SFS if I get enough intrest from others and a spare few months to put it together. If you are interested in implementing it (or perhaps contracting me to implement it), please contact me.