Continuous Outlier Monitoring on Uncertain Data Streams

Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach — Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.

[1]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[2]  Ye Yuan,et al.  An Algorithm for Outlier Detection on Uncertain Data Stream , 2013, APWeb.

[3]  Yannis Manolopoulos,et al.  Continuous monitoring of distance-based outliers over data streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[5]  Ge Yu,et al.  An Efficient Method for Cleaning Dirty-Events over Uncertain Data in WSNs , 2011, Journal of Computer Science and Technology.

[6]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[8]  Aoying Zhou,et al.  Continuous ranking on uncertain streams , 2012, Frontiers of Computer Science.

[9]  Chen Zhang,et al.  Tracking High Quality Clusters over Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[11]  Susanne E. Hambrusch,et al.  Indexing Uncertain Categorical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Bin Wang,et al.  Distance-Based Outlier Detection on Uncertain Data , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[13]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[14]  Eamonn J. Keogh,et al.  Data Editing Techniques to Allow the Application of Distance-Based Outlier Detection to Streams , 2010, 2010 IEEE International Conference on Data Mining.

[15]  J. Pei,et al.  Outlier detection on uncertain data: Objects, instances, and inferences , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[16]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[17]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[18]  Matthew O. Ward,et al.  Neighbor-based pattern detection for windows over streaming data , 2009, EDBT '09.

[19]  Charu C. Aggarwal,et al.  On Density Based Transforms for Uncertain Data Mining , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Ira Assent,et al.  AnyOut: Anytime Outlier Detection on Streaming Data , 2012, DASFAA.

[21]  Chao Yan,et al.  Outlier analysis for gene expression data , 2008, Journal of Computer Science and Technology.

[22]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[23]  Ge Yu,et al.  Outlier Detection over Sliding Windows for Probabilistic Data Streams , 2010, Journal of Computer Science and Technology.

[24]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[25]  Philip S. Yu,et al.  Outlier Detection with Uncertain Data , 2008, SDM.