Clustering Support and Replication Management for Scalable Network Services

The ubiquity of the Internet and various intranets has brought about widespread availability of online services and applications accessible through the network. Cluster-based network services have been rapidly emerging due to their cost-effectiveness in achieving high availability and incremental scalability. We present the design and implementation of the Neptune middleware system that provides clustering support and replication management for scalable network services. Neptune employs a loosely connected and functionally symmetric clustering architecture to achieve high scalability and robustness. It shields the clustering complexities from application developers through simple programming interfaces. In addition, Neptune provides replication management with flexible replication consistency support at the clustering middleware level. Such support can be easily applied to a large number of applications with different underlying data management mechanisms or service semantics. The system has been implemented on Linux and Solaris clusters, where a number of applications have been successfully deployed. Our evaluations demonstrate the system performance and smooth failure recovery achieved by proposed techniques.

[1]  Divyakant Agrawal,et al.  Epidemic Algorithms for Replicated Databases , 2003, IEEE Trans. Knowl. Data Eng..

[2]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[3]  Amnon Barak,et al.  The MOSIX Distributed Operating System: Load Balancing for UNIX , 1993 .

[4]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[5]  Terence R. Smith,et al.  The Alexandria Digital Library Project , 2003 .

[6]  Marvin Theimer,et al.  Flexible update propagation for weakly consistent replication , 1997, SOSP.

[7]  David E. Culler,et al.  Scalable, distributed data structures for internet service construction , 2000, OSDI.

[8]  S. S. Ravi,et al.  Deferred updates and data placement in distributed databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[9]  Michael Mitzenmacher,et al.  On the Analysis of Randomized Load Balancing Schemes , 1997, SPAA '97.

[10]  David E. Culler,et al.  Ninja: A Framework for Network Services , 2002, USENIX Annual Technical Conference, General Track.

[11]  Domenico Ferrari A Study of Load Indices for Load Balancing Schemes , 1985 .

[12]  Amin Vahdat,et al.  Toward Automatic State Management for Dynamic Web Services , 1999 .

[13]  S. Zhou,et al.  A Trace-Driven Simulation Study of Dynamic Load Balancing , 1987, IEEE Trans. Software Eng..

[14]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[15]  Willy Zwaenepoel,et al.  Scalable Content-aware Request Distribution in Cluster-based Network Servers , 2000, USENIX ATC, General Track.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  ShashaDennis,et al.  The dangers of replication and a solution , 1996 .

[18]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[19]  Edward D. Lazowska,et al.  Adaptive load sharing in homogeneous distributed systems , 1986, IEEE Transactions on Software Engineering.

[20]  Guerney D. H. Hunt,et al.  Network Dispatcher: A Connection Router for Scalable Internet Services , 1998, Comput. Networks.

[21]  Axel Ockenfels,et al.  Online Auctions , 2006 .

[22]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[23]  Avishai Wool,et al.  Replication, consistency, and practicality: are these mutually exclusive? , 1998, SIGMOD '98.

[24]  Ricardo Bianchini,et al.  Efficiency vs. portability in cluster-based network servers , 2001, PPoPP '01.

[25]  Erich M. Nahum,et al.  Locality-aware request distribution in cluster-based network servers , 1998, ASPLOS VIII.

[26]  Philip A. Bernstein,et al.  Principles of Transaction Processing , 1996 .

[27]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[28]  Songnian Zhou An Experimental Assessment of Resource Queue Lengths as Load Indices , 1986 .

[29]  Hector Garcia-Molina,et al.  Elections in a Distributed Computing System , 1982, IEEE Transactions on Computers.

[30]  Divyakant Agrawal,et al.  Epidemic algorithms in replicated databases (extended abstract) , 1997, PODS.

[31]  Barbara Liskov,et al.  Lazy consistency using loosely synchronized clocks , 1997, PODC '97.

[32]  Tao Yang,et al.  Class-based cache management for dynamic Web content , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).