Online outlier detection in sensor data using non-parametric models

Sensor networks have recently found many popular applications in a number of different settings. Sensors at different locations can generate streaming data, which can be analyzed in real-time to identify events of interest. In this paper, we propose a framework that computes in a distributed fashion an approximation of multi-dimensional data distributions in order to enable complex applications in resource-constrained sensor networks.We motivate our technique in the context of the problem of outlier detection. We demonstrate how our framework can be extended in order to identify either distance- or density-based outliers in a single pass over the data, and with limited memory requirements. Experiments with synthetic and real data show that our method is efficient and accurate, and compares favorably to other proposed techniques. We also demonstrate the applicability of our technique to other related problems in sensor networks.

[1]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[2]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[5]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[6]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[7]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[8]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[9]  Bernhard Seeger,et al.  A comparison of selectivity estimators for range queries on metric attributes , 1999, SIGMOD '99.

[10]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[11]  Nitin H. Vaidya,et al.  Leader election algorithms for mobile ad hoc networks , 2000, DIALM '00.

[12]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[13]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[14]  Dimitrios Gunopulos,et al.  Approximating multi-dimensional aggregate range queries over real attributes , 2000, SIGMOD '00.

[15]  Kristofer S. J. Pister,et al.  Smart Dust: Communicating with a Cubic-Millimeter Computer , 2001, Computer.

[16]  Lillian Lee,et al.  On the effectiveness of the skew divergence for statistical language analysis , 2001, AISTATS.

[17]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[18]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[19]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[20]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[21]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Konstantinos Kalpakis,et al.  Adaptive Methods for Activity Monitoring of Streaming Data , 2002, ICMLA.

[23]  Sudipto Guha,et al.  Approximating a data stream for querying and estimation: algorithms and performance evaluation , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Sudipto Guha,et al.  Dynamic multidimensional histograms , 2002, SIGMOD '02.

[25]  Deborah Estrin,et al.  Impact of network density on data aggregation in wireless sensor networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[26]  Haiyun Luo,et al.  A two-tier data dissemination model for large-scale wireless sensor networks , 2002, MobiCom '02.

[27]  Dipankar Raychaudhuri,et al.  Routing protocols for self-organizing hierarchical ad-hoc wireless networks , 2003 .

[28]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[29]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[30]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[31]  Philippe Bonnet,et al.  Adaptive and Decentralized Operator Placement for In-Network Query Processing , 2003, Telecommun. Syst..

[32]  William Perrizo,et al.  RDF: a density-based outlier detection method using vertical data representation , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[33]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[34]  Jeffrey Scott Vitter,et al.  Mining deviants in time series data streams , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[35]  Edward Y. Chang,et al.  Adaptive stream resource management using Kalman Filters , 2004, SIGMOD '04.

[36]  Divyakant Agrawal,et al.  Medians and beyond: new aggregation techniques for sensor networks , 2004, SenSys '04.

[37]  C. Guestrin,et al.  Distributed regression: an efficient framework for modeling sensor network data , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[38]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[39]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[40]  Sanjeev Khanna,et al.  Power-conserving computation of order-statistics over sensor networks , 2004, PODS.

[41]  Samuel Madden,et al.  Using Probabilistic Models for Data Management in Acquisitional Environments , 2005, CIDR.

[42]  Deborah Estrin,et al.  Multiresolution storage and search in sensor networks , 2005, TOS.

[43]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[44]  Peter J. Haas,et al.  Techniques for Warehousing of Sample Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[45]  Sudipto Guha,et al.  Streaming and sublinear approximation of entropy and information distances , 2005, SODA '06.