Scalable, distributed data structures for internet service construction

This paper presents a new persistent data management layer designed to simplify cluster-based Internet service construction. This self-managing layer, called a distributed data structure (DDS), presents a conventional single-site data structure interface to service authors, but partitions and replicates the data across a cluster. We have designed and implemented a distributed hash table DDS that has properties necessary for Internet services (incremental scaling of throughput and data capacity, fault tolerance and high availability, high concurrency, consistency, and durability). The hash table uses two-phase commits to present a coherent view of its data across all cluster nodes, allowing any node to service any task. We show that the distributed hash table simplifies Internet service construction by decoupling service-specific logic from the complexities of persistent, consistent state management, and by allowing services to inherit the necessary service properties from the DDS rather than having to implement the properties themselves. We have scaled the hash table to a 128 node cluster, 1 terabyte of storage, and an in-core read throughput of 61,432 operations/s and write throughput of 13,582 operations/s.

[1]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.

[2]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[3]  Bruce G. Lindsay,et al.  A retrospective of R*: A distributed database management system , 1987, Proceedings of the IEEE.

[4]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1988, TOCS.

[5]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[6]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[7]  Goetz Graefe,et al.  Encapsulation of parallelism in the Volcano query processing system , 1990, SIGMOD '90.

[8]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[9]  Witold Litwin,et al.  RP*: A Family of Order Preserving Scalable Distributed Data Structures , 1994, VLDB.

[10]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[11]  P. Mockapetris,et al.  Development of the Domain Name System , 1988, CCRV.

[12]  David A. Patterson,et al.  Serverless network file systems , 1995, SOSP.

[13]  Thomas P. Brisco DNS Support for Load Balancing , 1995, RFC.

[14]  Daniel Andresen,et al.  Scalability issues for high performance digital libraries on the World Wide Web , 1996, Proceedings of the Third Forum on Research and Technology Advances in Digital Libraries,.

[15]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[16]  Peter B. Danzig,et al.  A Hierarchical Internet Object Cache , 1996, USENIX ATC.

[17]  Tore Risch,et al.  LH*LH: A scalable High Performance Data Structure for Switched Multicomputers , 1996, EDBT.

[18]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[19]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[20]  Eric A. Brewer,et al.  System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace , 1997, USENIX Symposium on Internet Technologies and Systems.

[21]  Steven McCanne,et al.  An active service framework and its application to real-time multimedia transcoding , 1998, SIGCOMM '98.

[22]  Douglas C. Schmidt,et al.  APPLYING THE PROACTOR PATTERN TO HIGH-PERFORMANCE WEB SERVERS , 1998 .

[23]  Erich M. Nahum,et al.  Locality-aware request distribution in cluster-based network servers , 1998, ASPLOS VIII.

[24]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[25]  Peter Druschel,et al.  A Scalable and Explicit Event Delivery Mechanism for UNIX , 1999, USENIX Annual Technical Conference, General Track.

[26]  B. Bershad,et al.  Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service , 1999, SOSP.

[27]  David A. Wagner,et al.  The Ninja Jukebox , 1999, USENIX Symposium on Internet Technologies and Systems.

[28]  Willy Zwaenepoel,et al.  Flash: An efficient and portable Web server , 1999, USENIX Annual Technical Conference, General Track.

[29]  Eric Levy-Abegnoli,et al.  Design alternatives for scalable Web server accelerators , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[30]  Daniel M. Dias,et al.  High-Performance Web Site Design Techniques , 2000, IEEE Internet Comput..