The Dynamic Enterprise Bus

In this paper we present a prototype enterprise information run-time heavily inspired by the fundamental principles of autonomic computing. Through self-configuration, self- optimization, and self-healing techniques, this run-time targets next-generation extreme scale information-access systems. We demonstrate these concepts in a processing cluster that allocates resources dynamically upon demand.

[1]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[2]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[3]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[4]  Robbert van Renesse,et al.  Fireflies: scalable support for intrusion-tolerant network overlays , 2006, EuroSys.

[5]  John F. Shoch,et al.  The “worm” programs—early experience with a distributed computation , 1982, CACM.

[6]  Jeffrey O. Kephart,et al.  An architectural approach to autonomic computing , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[7]  GhemawatSanjay,et al.  The Google file system , 2003 .

[8]  Fred B. Schneider,et al.  NAP: practical fault-tolerance for itinerant computations , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[9]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[10]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[11]  George Candea,et al.  Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[12]  Edmund H. Durfee,et al.  Planning and Resource Allocation for Hard Real-Time, Fault-Tolerant Plan Execution , 1999, Agents.

[13]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[14]  Louise E. Moser,et al.  Byzantine Fault Detectors for Solving Consensus , 2003, Comput. J..

[15]  Rachid Guerraoui,et al.  Muteness Failure Detectors: Specification and Implementation , 1999, EDCC.

[16]  Andreas Haeberlen,et al.  Proactive Replication for Data Durability , 2006, IPTPS.

[17]  Antony I. T. Rowstron,et al.  PAST: a large-scale, persistent peer-to-peer storage utility , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[20]  Michael K. Reiter,et al.  Unreliable intrusion detection in distributed computations , 1997, Proceedings 10th Computer Security Foundations Workshop.