The next incarnation of the Internet will liberate both the content and the CPU cycles from the actual hardware that performs storage and computation. Services such as Publius, Freenet and Napster already hint at what true distributed storage might actually be like, while distributed.net and seti@home hint at what distributed cycle serving might be like. By combining these services with a good content description system, a finder, a dose of e-rights and distributed locks in a trivial-to-administer, secure, chroot jail, the future will be upon us.
To summarize Gelernter's points in present day technologies, lets examine the following influences/confluences:
If you are reading this web page near the turn of the century, chances are good that you're browser fetched it off of the web server I run at home. Chances are also good that you got it off some caching proxy. I know my ISP runs one. But I know that my life would be a lot better if I didn't actually have to be sysadmin for the server I run at home. I would like it much better if I could just publish this page, period. Not worry about maintaining the server, about doing backups. Just publish it on FreeNet or Publius. If everyone's home computer was automatically a node/server on Publius, and if publius required zero system administration, then I, as a writer/publisher, would be very happy. I could just write these thoughts, and not worry about the computing infrastructure to make sure that you can read this. We conclude that the eternity service is an important component of Gelernter's Manifesto, which he sadly fails to name as an important, contributing technology.
A crucial component of this idea is that of 'zero administration': the ultimate system must be so simple that any PC connected to the net could become a node, a part of the distributed storage infrastructure. The owner of a PC (e.g. my mom) should not have to give it much thought: if its hooked up to the Internet, its a part of the system.
See also:
Unfortunately, search engines do *not* relieve me of the duty of adding hyper-links to my writing. This is a bit tedious, a bit odious. It would be nice if any phrase in this hypertext was in fact a link that more or less pointed to that thing which I might have intended it to point at. If that were possible, then we really would have content-addressable memory. Furthermore, search engines are limited to ASCII text: they are essentially useless for binary content. To find binary content, one must now visit specialized sites, such as rufus.w3.org to locate RPM's, tucows to locate shareware, or mp3.com or scour.net to find audiovisual content. Each of these systems are appallingly poor at what they do: the RPM Spec file is used to build the rufus directories, but doesn't really contain adequate information. The mp3 and shareware sites are essentially built by hand: that part of the world doesn't even have the concept of an LSM to classify and describe content! (LSM is a machine-readable format used by metalab.unc.edu to classify the content of packages in its software repository.)
What is really needed is an infrastructure for more closely defining the content of a 'file' in both machine-readable and human-understandable terms. At the very least, there is the concept of mime-types. Web-page designers can use the <meta> tags to define some additional info about an object. With the growth of popularity of XML, there is some hope that the XML DTD's can be used to understand the type of object. There is the semi-forgotten, semi-ignored concept of 'object naming' and 'object trading brokers' as defined by CORBA, which attempt to match object requests to any object that might fill that request, rather than to an individually named object. Finally, there are sporadic attempts to classify content: LSM's used by metalab.unc.edu, RPM Spec files used by rufus.w3.org, deb's used by the Debian distribution. MP3's have an extremely poor content description mechanism: one can store the name of the artist, the title, the year and the genre. But these are isolated examples with no unifying structure.
Unfortunately, Gelernter is right: there is no all-encompassing object description framework or proposal in existence that can fill these needs. We need something more than a mime-type, and something less than a free-text search engine, to help describe and locate an object. The system must be simple enough to use everywhere: one might desire to build it into the filesystem, in the same way that 'owner' and 'modification date' are file attributes. It will have to become a part of the 'finder', such as the Apple Macintosh Finder or Nautilus, the Eazel finder. It must be general enough to describe non-ASCII files, so that search engines (such as google) could perform intelligent searches for binary content. Today, google cannot classify nor return content based on LSM's, RPM's, deb's, or the MP3 artist/title/genre fields.
The gotcha is that there is currently no distributed computing client that is is 'foolproof': providing generic services, easy to install and operate, hard for a cracker/hacker to subvert. There are no easy programming API's. Commercial startups Popular Power and Process Tree Network offer money for distributed cpu cycles. A criticism of these projects might be that they are centered on large, paying projects: thus, there is no obvious way for smaller projects or individuals to participate. In particular, I might have some application that only needed hundreds of computers for a week, not tens of thousands for a year. Can I, as a small individual, get access to the system? This is important: the massive surge of the popularity of the Internet/www was precisely that it gave "power to the people": individual webmasters could publish whatever they wanted. There was no centralized authority, there was rather a loose confederation. It seems to me that the success of distributed computing also depends on a means of not just delegating rights and authorities, but bringing them to the community for general use and misuse.
Cosm provides a programming API that aims to meet the requirements of distributed computing. It is currently ham-strung over licensing issues. The current license makes commercial and non-commercial use difficult and impossible by requiring the 'data' and 'results' to be published, as well as the 'source code' used in the project. Many users will find these impractical to live up to. My recommendation? GPL it!
Other clients:
References:
Alternative systems, such as Swarmcast, are being developed to solve this type of problem with a peer-to-peer infrastructure. The basic idea is that if some local client is receiving the same data, then it can rebroadcast the data to another nearby peer. Note, however, that the benefits of Swarmcast would be quickly diminished if e.g. freenet nodes were widely deployed, and the publication of a file was made through freenet. Essentially all of the interesting properties of Swarmcast are already embodied in distributed file systems. The short-term commercial advantage may be that Swarmcast gets more widely deployed than freenet. However, this advantage, if it indeed exists, is short-term, and might be quickly erased.
I am not yet aware of any generally available streaming-media reflectors, other than on those based on MBONE.
Similarly, remember 'The Computer for the Rest of Us'? Well, before the web exploded, Marc Andressen used to talk about 'The Internet for the Rest of Us'. Clearly, some GUI slapped on the Internet would make it far more palatable, as opposed to the 'command-line' of telnet and ftp. But a web browser is not just a pretty GUI slapped on telnet or ftp, and if it had been, the WWW still wouldn't exist (what happened to 'gopher'? Simple: no pictures, no 'home pages'). The success of the WWW needed a new, simple, easy technology: HTTP and hyperlinks, to make it go. The original HTTP and HTML were dirt-simple, and that was half the power of the early Internet. Without this simplicity and ease of use, the net wouldn't have happened.
What about 'the rest of us'? It wasn't just technology that made the Internet explode, it was what the technology could do. It allowed (almost) anyone to publish anything at a tiny fraction of the cost of traditional print/radio/TV publishing. It gave power to the people. It was a fundamentally democratic movement that was inclusive, that allowed anyone to participate, not just the rich, or the members of media empires. In a bizarrely different way, it is these same forces that power Napster: even if the music publishing industry hadn't fallen asleep at the wheel, it is the democratization that drives Napster. Rather than listening to what the music industry wants me to listen to, I can finally listen to what I want to listen to. At long last, I am able to match artist to the artists work, rather than listening to the radio and scratching my head 'gee I liked that song, but what the hell was the name of the artist?' Before Napster, if I didn't know what music CD to buy, even when I wanted to. I wasn't hip enough to have freinds who new the names of the cool bands, the CD's that were worth buying. Now, finally, I know the names of the bands that I like. Napster gives control back to the man in the street.
Similarly, the final distributed storage/computation infrastructure will have to address similar populist goals: it must be inclusive, not exclusive. Everyone must be able to participate. It must be for 'the rest of us'.
(N.B. These remarks are a bit off-base. Freenet now includes a date-based versioning scheme.)