Robust monitoring of network-wide aggregates through gossiping

We investigate the use of gossip protocols for continuous monitoring of network-wide aggregates under crash failures. Aggregates are computed from local management variables using functions such as SUM, MAX, or AVERAGE. For this type of aggregation, crash failures offer a particular challenge due to the problem of mass loss, namely, how to correctly account for contributions from nodes that have failed. In this paper we give a partial solution. We present G-GAP, a gossip protocol for continuous monitoring of aggregates, which is robust against failures that are discontiguous in the sense that neighboring nodes do not fail within a short period of each other. We give formal proofs of correctness and convergence, and we evaluate the protocol through simulation using real traces. The simulation results suggest that the design goals for this protocol have been met. For instance, the tradeoff between estimation accuracy and protocol overhead can be controlled, and a high estimation accuracy (below some 5% error in our measurements) is achieved by the protocol, even for large networks and frequent node failures. Further, we perform a comparative assessment of GGAP against a tree-based aggregation protocol using simulation. Surprisingly, we find that the tree-based aggregation protocol consistently outperforms the gossip protocol for comparative overhead, both in terms of accuracy and robustness.

[1]  Rolf Stadler,et al.  A GENERIC PROTOCOL FOR NETWORK STATE AGGREGATION , 2005 .

[2]  J.N. Tsitsiklis,et al.  Convergence Rates in Distributed Consensus and Averaging , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[3]  Rolf Stadler,et al.  Real-time views of network traffic using decentralized management , 2005, 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005..

[4]  R.M. Murray,et al.  Asynchronous Distributed Averaging on Communication Networks , 2007, IEEE/ACM Transactions on Networking.

[5]  Chunqiang Tang,et al.  GoCast: gossip-enhanced overlay multicast for fast and dependable group communication , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[6]  Márk Jelasity,et al.  Gossip-based aggregation in large dynamic networks , 2005, TOCS.

[7]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[8]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[9]  Rolf Stadler,et al.  Robust monitoring of network-wide aggregates through gossiping , 2007, IEEE Transactions on Network and Service Management.

[10]  Jon M. Kleinberg,et al.  Protocols and impossibility results for gossip-based communication mechanisms , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[11]  Mohamed A. Sharaf,et al.  Balancing energy efficiency and quality of aggregate data in sensor networks , 2004, The VLDB Journal.

[12]  Richard M. Karp,et al.  Randomized rumor spreading , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[13]  Devavrat Shah,et al.  Computing separable functions via gossip , 2005, PODC '06.

[14]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[15]  Maarten van Steen,et al.  CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays , 2005, Journal of Network and Systems Management.

[16]  Indranil Gupta,et al.  Scalable fault-tolerant aggregation in large process groups , 2001, 2001 International Conference on Dependable Systems and Networks.

[17]  Indranil Gupta,et al.  Decentralized Schemes for Size Estimation in Large and Dynamic Groups , 2005, Fourth IEEE International Symposium on Network Computing and Applications.

[18]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[19]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[20]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[21]  Bibudh Lahiri,et al.  Computing Frequent Elements Using Gossip , 2008, SIROCCO.

[22]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[23]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[24]  Nick Roussopoulos,et al.  Hierarchical In-Network Data Aggregation with Quality Guarantees , 2004, EDBT.

[25]  Rolf Stadler,et al.  Adaptive distributed monitoring with accuracy objectives , 2006, INM '06.

[26]  Benjamin Van Roy,et al.  Consensus Propagation , 2005, IEEE Transactions on Information Theory.

[27]  Flaviu Cristian,et al.  A performance comparison of asynchronous atomic broadcast protocols , 1994, Distributed Syst. Eng..

[28]  Ken Birman,et al.  The promise, and limitations, of gossip protocols , 2007, OPSR.

[29]  David Kempe,et al.  A decentralized algorithm for spectral analysis , 2004, STOC '04.

[30]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.