Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

Scalable management and self-organizational capabilities areemerging as central requirements for a generation of large-scale,highly dynamic, distributed applications. We have developed anentirely new distributed information management system calledAstrolabe. Astrolabe collects large-scale system state, permittingrapid updates and providing on-the-fly attribute aggregation. Thislatter capability permits an application to locate a resource, andalso offers a scalable way to track system state as it evolves overtime. The combination of features makes it possible to solve a widevariety of management and self-configuration problems. This paperdescribes the design of the system with a focus upon itsscalability. After describing the Astrolabe service, we presentexamples of the use of Astrolabe for locating resources,publish-subscribe, and distributed synchronization in largesystems. Astrolabe is implemented using a peer-to-peer protocol,and uses a restricted form of mobile code based on the SQL querylanguage for aggregation. This protocol gives rise to a novelconsistency model. Astrolabe addresses several securityconsiderations using a built-in PKI. The scalability of the systemis evaluated using both simulation and experiments; these confirmthat Astrolabe could scale to thousands and perhaps millions ofnodes, with information propagation delays in the tens of seconds.

[1]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[2]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.

[3]  Butler W. Lampson,et al.  Designing a global name service , 1986, PODC '86.

[4]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[5]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[6]  Richard A. Golding A Weak-Consistency Architecture for Distributed Information Services , 1992, Comput. Syst..

[7]  Dale Skeen,et al.  The Information Bus: an architecture for extensible distributed systems , 1994, SOSP '93.

[8]  Sara Radicati,et al.  X.500 directory services: Technology and deployment , 1994 .

[9]  Darrell D. E. Long,et al.  The refdbms Distributed Bibliographic Database System , 1994, USENIX Winter.

[10]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[11]  G.J. Minden,et al.  A survey of active network research , 1997, IEEE Communications Magazine.

[12]  Marvin Theimer,et al.  Flexible update propagation for weakly consistent replication , 1997, SOSP.

[13]  Franz J. Hauck,et al.  Locating objects in wide-area systems , 1998, IEEE Commun. Mag..

[14]  Roger E. Sanders,et al.  ODBC 3.5 developer's guide , 1998 .

[15]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[16]  Kenneth P. Birman,et al.  Bimodal multicast , 1999, TOCS.

[17]  Hari Balakrishnan,et al.  The design and implementation of an intentional naming system , 1999, SOSP.

[18]  George Reese,et al.  Database Programming with JDBC and Java, Second Edition , 2000 .

[19]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[20]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[21]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[22]  Deborah Estrin,et al.  Building efficient wireless sensor networks with low-level naming , 2001, SOSP.

[23]  Philippe Bonnet,et al.  Towards Sensor Database Systems , 2001, Mobile Data Management.

[24]  Ben Y. Zhao,et al.  The Ninja architecture for robust Internet-scale systems and services , 2001, Comput. Networks.

[25]  Alex C. Snoeren,et al.  Mesh-based content routing using XML , 2001, SOSP.

[26]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[27]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[28]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[29]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[30]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[31]  Fred B. Schneider,et al.  Spreading rumors cheaply, quickly, and reliably , 2002 .

[32]  David R. Karger,et al.  INS/Twine: A Scalable Peer-to-Peer Architecture for Intentional Resource Discovery , 2002, Pervasive.

[33]  Robbert van Renesse Power-Aware Epidemics , 2002, SRDS.

[34]  Robbert van Renesse,et al.  Collaborative networking in an uncooperative Internet , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[35]  John Goerzen Domain Name System , 2004 .