In-network approximate computation of outliers with quality guarantees

Wireless sensor networks are becoming increasingly popular for a variety of applications. Users are frequently faced with the surprising discovery that readings produced by the sensing elements of their motes are often contaminated with outliers. Outlier readings can severely affect applications that rely on timely and reliable sensory data in order to provide the desired functionality. As a consequence, there is a recent trend to explore how techniques that identify outlier values based on their similarity to other readings in the network can be applied to sensory data cleaning. Unfortunately, most of these approaches incur an overwhelming communication overhead, which limits their practicality. In this paper we introduce an in-network outlier detection framework, based on locality sensitive hashing, extended with a novel boosting process as well as efficient load balancing and comparison pruning mechanisms. Our method trades off bandwidth for accuracy in a straightforward manner and supports many intuitive similarity metrics. Our experiments demonstrate that our framework can reliably identify outlier readings using a fraction of the bandwidth and energy that would otherwise be required.

[1]  Wang-Chien Lee,et al.  Using sensorranks for in-network detection of faulty readings in wireless sensor networks , 2007, MobiDE '07.

[2]  Ossama Younis,et al.  Distributed clustering in ad-hoc sensor networks: a hybrid, energy-efficient approach , 2004, IEEE INFOCOM 2004.

[3]  Nick Roussopoulos,et al.  Bandwidth-constrained queries in sensor networks , 2008, The VLDB Journal.

[4]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[5]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[6]  Alice M. Agogino,et al.  Fuzzy Validation and Fusion for Wireless Sensor Networks , 2004 .

[7]  Lei Chen,et al.  A Weighted Moving Average-based Approach for Cleaning Sensor Data , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[8]  Alex Delis,et al.  Another Outlier Bites the Dust: Computing Meaningful Aggregates in Sensor Networks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[9]  Patrick Pantel,et al.  Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[10]  Bo Sheng,et al.  Outlier detection in sensor networks , 2007, MobiHoc '07.

[11]  Carlo Zaniolo,et al.  Data cleaning using belief propagation , 2005, IQIS '05.

[12]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[13]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[14]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[15]  Brad Karp,et al.  GPSR: greedy perimeter stateless routing for wireless networks , 2000, MobiCom '00.

[16]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[17]  Sajal K. Das,et al.  WCA: A Weighted Clustering Algorithm for Mobile Ad Hoc Networks , 2002, Cluster Computing.

[18]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[19]  Dan Suciu,et al.  Towards correcting input data errors probabilistically using integrity constraints , 2006, MobiDE '06.

[20]  Yannis Kotidis,et al.  Random hyperplane projection using derived dimensions , 2010, MobiDE '10.

[21]  Yannis Kotidis,et al.  Snapshot queries: towards data-centric sensor networks , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Nick Koudas,et al.  The design of a query monitoring system , 2009, TODS.

[23]  Gustavo Alonso,et al.  A Pipelined Framework for Online Cleaning of Sensor Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[24]  Nikos Giatrakos,et al.  PAO: power-efficient attribution of outliers in wireless sensor networks , 2010, DMSN '10.

[25]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[26]  B. R. Badrinath,et al.  Cleaning and querying noisy sensors , 2003, WSNA '03.

[27]  Kai Li,et al.  Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces , 2008, SIGIR '08.

[28]  Min Qin,et al.  VCA: An Energy-Efficient Voting-Based Clustering Algorithm for Sensor Networks. , 2007 .

[29]  M - Estimating Aggregates on a Peer-to-Peer Network , 2003 .

[30]  J Wang,et al.  [Expression of MDM2 gene in acute leukemia]. , 1995, Zhonghua yi xue za zhi.

[31]  Andreas Pitsillides,et al.  The MicroPulse Framework for Adaptive Waking Windows in Sensor Networks , 2007, 2007 International Conference on Mobile Data Management.

[32]  Piotr Indyk Dimensionality reduction techniques for proximity problems , 2000, SODA '00.

[33]  Yannis Theodoridis,et al.  TACO: tunable approximate computation of outliers in wireless sensor networks , 2010, SIGMOD Conference.

[34]  Philip S. Yu,et al.  Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[35]  David E. Culler,et al.  TOSSIM: accurate and scalable simulation of entire TinyOS applications , 2003, SenSys '03.

[36]  David E. Culler,et al.  Analysis of wireless sensor networks for habitat monitoring , 2004 .

[37]  Thomas Greve Kristensen Transforming Tanimoto queries on real valued vectors to range queries in Euclidian space , 2010 .

[38]  Nick Roussopoulos,et al.  Hierarchical In-Network Data Aggregation with Quality Guarantees , 2004, EDBT.

[39]  Wendi Heinzelman,et al.  Energy-efficient communication protocol for wireless microsensor networks , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[40]  Srinivasan Parthasarathy,et al.  Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.

[41]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[42]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[43]  Dimitrios Gunopulos,et al.  Efficient and tumble similar set retrieval , 2001, SIGMOD '01.

[44]  Jeffrey Considine,et al.  Robust approximate aggregation in sensor data management systems , 2009, TODS.

[45]  Xiuli Ma,et al.  A Kalman Filter Based Approach for Outlier Detection in Sensor Networks , 2008, 2008 International Conference on Computer Science and Software Engineering.

[46]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[47]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[48]  Yi Jiang,et al.  A topology-aware hierarchical structured overlay network based on locality sensitive hashing scheme , 2007, UPGRADE '07.

[49]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[50]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[51]  Matt Welsh,et al.  Simulating the power consumption of large-scale sensor network applications , 2004, SenSys '04.

[52]  Nick Roussopoulos,et al.  Compressing historical information in sensor networks , 2004, SIGMOD '04.

[53]  Yong Yao,et al.  The cougar approach to in-network query processing in sensor networks , 2002, SGMD.

[54]  Panagiotis Papapetrou,et al.  Nearest Neighbor Retrieval Using Distance-Based Hashing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[55]  Mohamed A. Sharaf,et al.  TiNA: a scheme for temporal coherency-aware in-network aggregation , 2003, MobiDe '03.

[56]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[57]  Jianzhong Li,et al.  Unsupervised Outlier Detection in Sensor Networks Using Aggregation Tree , 2007, ADMA.

[58]  Arun Somani,et al.  Distributed fault detection of wireless sensor networks , 2006, DIWANS '06.

[59]  Lei Chen,et al.  In-network Outlier Cleaning for Data Collection in Sensor Networks , 2006, CleanDB.

[60]  Gregory Gutin,et al.  Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the TSP , 2001, Discret. Appl. Math..

[61]  Ran Wolff,et al.  In-Network Outlier Detection in Wireless Sensor Networks , 2006, ICDCS.

[62]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.