论文信息 - ACMS: the Akamai configuration management system

ACMS: the Akamai configuration management system

An important trend in information technology is the use of increasingly large distributed systems to deploy increasingly complex and mission-critical applications. In order for these systems to achieve the ultimate goal of having similar ease-of-use properties as centralized systems they must allow fast, reliable, and lightweight management and synchronization of their configuration state. This goal poses numerous technical challenges in a truly Internet-scale system, including varying degrees of network connectivity, inevitable machine failures, and the need to distribute information globally in a fast and reliable fashion. In this paper we discuss the design and implementation of a configuration management system for the Akamai Network. It allows reliable yet highly asynchronous delivery of configuration information, is significantly fault-tolerant, and can scale if necessary to hundreds of thousands of servers. The system is fully functional today providing configuration management to over 15,000 servers deployed in 1200+ different networks in 60+ countries.

[1] Antony I. T. Rowstron,et al. Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[2] Kirk L. Johnson,et al. Overcast: reliable multicasting with on overlay network , 2000, OSDI.

[3] Sanjoy Paul,et al. Reliable Multicast Transport Protocol (RMTP) , 1997, IEEE J. Sel. Areas Commun..

[4] Jim Gray,et al. Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[5] David R. Karger,et al. Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[6] Leslie Lamport,et al. The Byzantine Generals Problem , 1982, TOPL.

[7] Stephen E. Deering,et al. Host extensions for IP multicasting , 1986, RFC.

[8] Ben Y. Zhao,et al. OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[9] Ben Y. Zhao,et al. Bayeux: an architecture for scalable and fault-tolerant wide-area data dissemination , 2001, NOSSDAV '01.

[10] Peter Druschel,et al. Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[11] Magnus Karlsson,et al. Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.

[12] Srinivasan Seshan,et al. A case for end system multicast , 2002, IEEE J. Sel. Areas Commun..

[13] Hui Zhang,et al. A case for end system multicast (keynote address) , 2000, SIGMETRICS '00.

[14] Steven McCanne,et al. RMX: reliable multicast for heterogeneous networks , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[15] Miguel Castro,et al. SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[16] Ben Y. Zhao,et al. Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[17] Helen J. Wang,et al. An evaluation of scalable application-level multicast built using peer-to-peer overlays , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[18] Marvin Theimer,et al. Flexible update propagation for weakly consistent replication , 1997, SOSP.

[19] Mark Handley,et al. A scalable content-addressable network , 2001, SIGCOMM 2001.

[20] Mark Handley,et al. A scalable content-addressable network , 2001, SIGCOMM '01.

[21] Hector Garcia-Molina,et al. Consistency in a partitioned network: a survey , 1985, CSUR.

[22] Mark Handley,et al. Application-Level Multicast Using Content-Addressable Networks , 2001, Networked Group Communication.

[23] David K. Gifford,et al. Weighted voting for replicated data , 1979, SOSP '79.

[24] Akhil Kumar,et al. Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data , 1991, IEEE Trans. Computers.

[25] Miguel Oom Temudo de Castro,et al. Practical Byzantine fault tolerance , 1999, OSDI '99.

[26] Ben Y. Zhao,et al. An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[27] Friedemann Mattern,et al. Virtual Time and Global States of Distributed Systems , 2002 .

[28] Nancy A. Lynch,et al. RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks , 2002, DISC.

[29] Mahadev Satyanarayanan,et al. Scalable, secure, and highly available distributed file access , 1990, Computer.

[30] Leslie Lamport,et al. The part-time parliament , 1998, TOCS.

[31] David R. Karger,et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.