Pastwatch: A Distributed Version Control System

Pastwatch is a version control system that acts like a traditional client-server system when users are connected to the network; users can see each other's changes immediately after the changes are committed. When a user is not connected, Pastwatch also allows users to read revisions from the repository, commit new revisions and share modifications directly between users, all without access to the central repository. In contrast, most existing version control systems require connectivity to a centralized server in order to read or update the repository. Each Pastwatch user's host keeps its own writable replica of the repository, including historical revisions. Users can synchronize their local replicas with each other or with one or more servers. Synchronization must handle inconsistency between replicas because users may commit concurrent and conflicting changes to their local replicas. Pastwatch represents its repository as a "revtree" data structure which tracks the relationships among these conflicting changes, including any reconciliation. The revtree also ensures that the replicas eventually converge to identical images after sufficient synchronization. We have implemented Pastwatch and evaluate it in a setting distributed over North America. We have been using it actively for more than a year. We show that the system is scalable beyond 190 users per project and that commit and update operations only take 2-4 seconds. Currently, five users and six different projects regularly use the system; they find that the system is easy to use and that the system's replication has masked several network and storage failures.

[1]  Peter L. Reiher,et al.  Peer Replication with Selective Control , 1999, MDA.

[2]  John S. Heidemann,et al.  Resolving File Conflicts in the Ficus File System , 1994, USENIX Summer.

[3]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[4]  Andreas Haeberlen,et al.  NSDI '06: 3rd Symposium on Networked Systems Design & Implementation , 2006 .

[5]  Bruce Walker,et al.  The LOCUS distributed operating system , 1983, SOSP '83.

[6]  Brighten Godfrey,et al.  OpenDHT: a public DHT service and its uses , 2005, SIGCOMM '05.

[7]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[8]  John Heidemann,et al.  Architecture of the Ficus Scalable Replicated File System , 1991 .

[9]  R. S. Fabry,et al.  A fast file system for UNIX , 1984 .

[10]  Mark Chu-Carroll,et al.  Coven: brewing better collaboration through software configuration management , 2000, SIGSOFT '00/FSE-8.

[11]  Brian Berliner,et al.  CVS II: Parallelizing Software Dev elopment , 1998 .

[12]  Mahadev Satyanarayanan,et al.  Flexible and Safe Resolution of File Conflicts , 1995, USENIX.

[13]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[14]  Alley Stoughton,et al.  Detection of Mutual Inconsistency in Distributed Systems , 1983, IEEE Transactions on Software Engineering.

[15]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[16]  Dahlia Malkhi,et al.  Concise version vectors in WinFS , 2005, Distributed Computing.

[17]  Robert Tappan Morris,et al.  Designing a DHT for Low Latency and High Throughput , 2004, NSDI.

[18]  David Mazières,et al.  A Toolkit for User-Level File Systems , 2001, USENIX Annual Technical Conference, General Track.

[19]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[20]  Magnus Karlsson,et al.  Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.

[21]  Robert Wilensky,et al.  The hash history approach for reconciling mutual inconsistency , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[22]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[23]  Scott Shenker,et al.  Spurring Adoption of DHTs with OpenHash, a Public DHT Service , 2004, IPTPS.

[24]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[25]  Laura Wingerd Practical Perforce , 2005 .