Using file-grain connectivity to implement a peer-to-peer file system

Recent work has demonstrated a peer-to-peer storage system that locates data objects using O(logN) messages by placing objects on nodes according to pseudo-randomly chosen IDs. While elegant, this approach constrains system functionality and flexibility: files are immutable, directories and symbolic names are not supported, data location is fixed, and access locality is not exploited. This paper presents Mammoth, a peer-to-peer hierarchical file system that, unlike alternative approaches, supports a traditional file-system API, allows files and directories to be stored on any node, and adapts storage location to exploit locality, balance load, and ensure availability. Our approach handles all coordination at the granularity of files instead of nodes. In effect, the nodes that store a particular file act as its server independently of other nodes in the system. The resulting system is highly available and robust to failure. Our experiments with our prototype have yielded good results, but an important question remains: how the system will perform on a massive scale. We discuss the key issues, some of which we have addressed and others that remain open.