STAR: Self-Tuning Aggregation for Scalable Monitoring

We present STAR, a self-tuning algorithm that adaptively sets numeric precision constraints to accurately and efficiently answer continuous aggregate queries over distributed data streams. Adaptivity and approximation are essential for both robustness to varying workload characteristics and for scalability to large systems. In contrast to previous studies, we treat the problem as a workload-aware optimization problem whose goal is to minimize the total communication load for a multi-level aggregation tree under a fixed error budget. STAR's hierarchical algorithm takes into account the update rate and variance in the input data distribution in a principled manner to compute an optimal error distribution, and it performs cost-benefit throttling to direct error slack to where it yields the largest benefits. Our prototype implementation of STAR in a large-scale monitoring system provides (1) a new distribution mechanism that enables self-tuning error distribution and (2) an optimization to reduce communication overhead in a practical setting by carefully distributing the initial, default error budgets. Through extensive simulations and experiments on a real network monitoring implementation, we show that STAR achieves significant performance benefits compared to existing approaches while still providing high accuracy and incurring low overheads.

[1]  Rajmohan Rajaraman,et al.  Accessing Nearby Copies of Replicated Objects in a Distributed Environment , 1997, SPAA '97.

[2]  Jessica K. Hodgins,et al.  Temporal notions of synchronization and consistency in Beehive , 1997, SPAA '97.

[3]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[4]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[5]  Jennifer Widom,et al.  Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data , 2000, VLDB.

[6]  Surajit Chaudhuri,et al.  A robust, optimization-based approach for approximate answering of aggregate queries , 2001, SIGMOD '01.

[7]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[8]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[9]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[10]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[11]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[12]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[13]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[14]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[15]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[16]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[17]  Robert Tappan Morris,et al.  Serving DNS Using a Peer-to-Peer Lookup Service , 2002, IPTPS.

[18]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[19]  EstanCristian,et al.  New directions in traffic measurement and accounting , 2002 .

[20]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[21]  David Mazières,et al.  Sloppy Hashing and Self-Organizing Clusters , 2003, IPTPS.

[22]  Michael B. Jones,et al.  SkipNet: A Scalable Overlay Network with Practical Locality Properties , 2003, USENIX Symposium on Internet Technologies and Systems.

[23]  Amin Vahdat,et al.  SHARP: an architecture for secure resource peering , 2003, SOSP '03.

[24]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[25]  David D. Clark,et al.  A knowledge plane for the internet , 2003, SIGCOMM '03.

[26]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[27]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[28]  Srinivasan Seshan,et al.  Cache-and-query for wide area sensor databases , 2003, SIGMOD '03.

[29]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[30]  Theodore Johnson,et al.  The Gigascope Stream Database , 2003, IEEE Data Eng. Bull..

[31]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[32]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[33]  M. Dahlin,et al.  A scalable distributed information management system , 2004, SIGCOMM '04.

[34]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[35]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[36]  Edward Y. Chang,et al.  Adaptive stream resource management using Kalman Filters , 2004, SIGMOD '04.

[37]  J. Hellerstein,et al.  A Wakeup Call for Internet Monitoring Systems : The Case for Distributed Triggers , 2004 .

[38]  Shamkant B. Navathe,et al.  Predictive filtering: a learning-based approach to data stream filtering , 2004, DMSN '04.

[39]  Matt Welsh,et al.  Hourglass: An Infrastructure for Connecting Sensor Networks and Applications , 2004 .

[40]  Nick Roussopoulos,et al.  Hierarchical In-Network Data Aggregation with Quality Guarantees , 2004, EDBT.

[41]  Larry L. Peterson,et al.  Sophia: an Information Plane for networked systems , 2004, Comput. Commun. Rev..

[42]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[43]  Scott Shenker,et al.  The Network Oracle , 2005, IEEE Data Eng. Bull..

[44]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[45]  Agathoniki Trigoni,et al.  A Study of Approximate Data Management Techniques for Sensor Networks , 2006, 2006 International Workshop on Intelligent Solutions in Embedded Systems.

[46]  Sujata Banerjee,et al.  S3: a scalable sensing service for monitoring large networked systems , 2006, INM '06.

[47]  Ling Huang,et al.  Communication-Efficient Tracking of Distributed Cumulative Triggers , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).