Usenix Association 8th Usenix Symposium on Operating Systems Design and Implementation 87 Network Imprecision: a New Consistency Metric for Scalable Monitoring

This paper introduces a new consistency metric, Network Imprecision (NI), to address a central challenge in largescale monitoring systems: safeguarding accuracy despite node and network failures. To implement NI, an overlay that monitors a set of attributes also monitors its own state so that queries return not only attribute values but also information about the stability of the overlay--the number of nodes whose recent updates may be missing and the number of nodes whose inputs may be double counted due to overlay reconfigurations. When NI indicates that the network is stable, query results are guaranteed to reflect the true state of the system. But when the network is unstable, NI puts applications on notice that query results should not be trusted, allowing them to take corrective action such as filtering out inconsistent results. To scalably implement NI's introspection, our prototype introduces a key optimization, dual-tree prefix aggregation, which exploits overlay symmetry to reduce overheads by more than an order of magnitude. Evaluation of three monitoring applications demonstrates that NI flags inaccurate results while incurring low overheads, and monitoring applications that use NI to select good information can improve their accuracy by up to an order of magnitude.

[1]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[2]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[3]  Sriram Ramabhadran,et al.  A case study in building layered DHT applications , 2005, SIGCOMM '05.

[4]  Matt Welsh,et al.  Hourglass: An Infrastructure for Connecting Sensor Networks and Applications , 2004 .

[5]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[6]  David D. Clark,et al.  A knowledge plane for the internet , 2003, SIGCOMM '03.

[7]  Krithi Ramamritham,et al.  Efficient Execution of Continuous Incoherency Bounded Queries over Multi-Source Streaming Data , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[8]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[9]  Rajeev Motwani,et al.  The price of validity in dynamic networks , 2004, SIGMOD '04.

[10]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[11]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[12]  Jessica K. Hodgins,et al.  Temporal notions of synchronization and consistency in Beehive , 1997, SPAA '97.

[13]  Flaviu Cristian,et al.  Fail-awareness in timed asynchronous systems , 1996, PODC '96.

[14]  Michael B. Jones,et al.  SkipNet: A Scalable Overlay Network with Practical Locality Properties , 2003, USENIX Symposium on Internet Technologies and Systems.

[15]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[16]  Antony I. T. Rowstron,et al.  Delay aware querying with Seaweed , 2007, The VLDB Journal.

[17]  Deborah Estrin,et al.  Directed diffusion: a scalable and robust communication paradigm for sensor networks , 2000, MobiCom '00.

[18]  Jimeng Sun,et al.  InteMon: continuous mining of sensor data in large-scale self-infrastructures , 2006, OPSR.

[19]  Ling Huang,et al.  Toward sophisticated detection with distributed triggers , 2006, MineNet '06.

[20]  Sujata Banerjee,et al.  S3: a scalable sensing service for monitoring large networked systems , 2006, INM '06.

[21]  Kenneth L. Calvert,et al.  Lightweight network support for scalable end-to-end services , 2002, SIGCOMM '02.

[22]  Idit Keidar,et al.  Efficient Dynamic Aggregation , 2006, DISC.

[23]  Sriram Ramabhadran,et al.  Cloud control with distributed rate limiting , 2007, SIGCOMM 2007.

[24]  J. Hellerstein,et al.  A Wakeup Call for Internet Monitoring Systems : The Case for Distributed Triggers , 2004 .

[25]  Ling Huang,et al.  Communication-Efficient Tracking of Distributed Cumulative Triggers , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[26]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[27]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[28]  Darryl Veitch,et al.  Robust synchronization of software clocks across the internet , 2004, IMC '04.

[29]  Srinivasan Seshan,et al.  Synopsis diffusion for robust aggregation in sensor networks , 2004, SenSys '04.

[30]  Kamesh Munagala,et al.  Suppression and failures in sensor networks: a Bayesian approach , 2007, VLDB 2007.

[31]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[32]  Suman Nath,et al.  Tributaries and deltas: efficient and robust aggregation in sensor network streams , 2005, SIGMOD '05.

[33]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[34]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[35]  Nick Roussopoulos,et al.  Hierarchical In-Network Data Aggregation with Quality Guarantees , 2004, EDBT.

[36]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[37]  Dawn Xiaodong Song,et al.  SIA: secure information aggregation in sensor networks , 2003, SenSys '03.

[38]  Yin Zhang,et al.  Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[39]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[40]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[41]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[42]  Srinivasan Seshan,et al.  Cache-and-query for wide area sensor databases , 2003, SIGMOD '03.

[43]  Ion Stoica,et al.  SAAR: A Shared Control Plane for Overlay Multicast , 2007, NSDI.

[44]  Christine Julien,et al.  Automatic consistency assessment for query results in dynamic environments , 2007, ESEC-FSE '07.

[45]  Michael Dahlin,et al.  Hierarchical Cache Consistency in a WAN , 1999, USENIX Symposium on Internet Technologies and Systems.

[46]  Joseph M. Hellerstein,et al.  Proof Sketches: Verifiable Multi-Party Aggregation , 2006 .

[47]  Yin Zhang,et al.  STAR: Self-Tuning Aggregation for Scalable Monitoring , 2007, VLDB.

[48]  Lorenzo Alvisi,et al.  A framework for semantic reasoning about Byzantine quorum systems , 2001, PODC '01.

[49]  Michael Dahlin,et al.  A scalable distributed information management system , 2004, SIGCOMM.

[50]  Jennifer Widom,et al.  Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data , 2000, VLDB.

[51]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[52]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[53]  Larry L. Peterson,et al.  Sophia: an Information Plane for networked systems , 2004, Comput. Commun. Rev..

[54]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[55]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[56]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[57]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[58]  Scott Shenker,et al.  Group Therapy for Systems: Using Link Attestations to Manage Failures , 2006, IPTPS.

[59]  Julio César López-Hernández,et al.  Stardust: tracking activity in a distributed storage system , 2006, SIGMETRICS '06/Performance '06.

[60]  Jeffrey Considine,et al.  Approximate aggregation techniques for sensor databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[61]  Deborah Estrin,et al.  Computing aggregates for monitoring wireless sensor networks , 2003, Proceedings of the First IEEE International Workshop on Sensor Network Protocols and Applications, 2003..

[62]  Justin Cappos,et al.  San Fermín: Aggregating Large Data Sets Using a Binomial Swap Forest , 2008, NSDI.

[63]  Krithi Ramamritham,et al.  Optimized query planning of continuous aggregation queries in dynamic data dissemination networks , 2007, WWW '07.

[64]  Scott Shenker,et al.  The Network Oracle , 2005, IEEE Data Eng. Bull..

[65]  David A. Wagner,et al.  Resilient aggregation in sensor networks , 2004, SASN '04.

[66]  Haifeng Yu,et al.  DoS-resilient secure aggregation queries in sensor networks , 2007, PODC '07.

[67]  Indranil Gupta,et al.  Scalable fault-tolerant aggregation in large process groups , 2001, 2001 International Conference on Dependable Systems and Networks.

[68]  Indranil Gupta,et al.  Decentralized Schemes for Size Estimation in Large and Dynamic Groups , 2005, Fourth IEEE International Symposium on Network Computing and Applications.

[69]  Rajmohan Rajaraman,et al.  Accessing Nearby Copies of Replicated Objects in a Distributed Environment , 1997, SPAA '97.

[70]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.