Research on real-time outlier detection over big data streams

ABSTRACT Nowadays technological advances have promoted big data streams common in many applications, including mobile internet applications, internet of things, and industry production process. Outliers should be detected from big data streams in many cases. However, the special characteristics of big data streams, such as transiency, uncertainty, multidimensionality, dynamic distribution, and dynamic relationship make outlier detection more challenging. This paper discusses the key issues, major challenges, and existing most frequently used methods for detecting outliers over big data streams, and then summarizes the directions for further investigation. This research can provide novel theoretical support and technical guidance for outlier detection over big data streams.

[1]  Jay Vala,et al.  Survey on Outlier Detection in Data Stream , 2016 .

[2]  Alok Agarwal,et al.  Outlier detection in streaming data a research perspective , 2014, 2014 International Conference on Parallel, Distributed and Grid Computing.

[3]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[4]  Ming-Syan Chen,et al.  Clustering over Multiple Evolving Streams by Events and Correlations , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  Kun Li,et al.  Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  Sanjay Chawla,et al.  On detection of emerging anomalous traffic patterns using GPS data , 2013, Data Knowl. Eng..

[7]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[8]  Mohammad Hadi Sadreddini,et al.  A sliding window based algorithm for frequent closed itemset mining over data streams , 2013, J. Syst. Softw..

[9]  Zhou Xiao-Yun,et al.  A Fast Outlier Detection Algorithm for High Dimensional Categorical Data Streams , 2007 .

[10]  Lei Cao,et al.  Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[11]  Kai Ming Ting,et al.  Fast Anomaly Detection for Streaming Data , 2011, IJCAI.

[12]  Philip S. Yu,et al.  RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection , 2014, 2014 IEEE International Conference on Data Mining.

[13]  Sun Da,et al.  Big Data Stream Computing: Technologies and Instances , 2014 .

[14]  Cyrus Shahabi,et al.  Distance-based Outlier Detection in Data Streams , 2016, Proc. VLDB Endow..

[15]  Hongjie Jia,et al.  Research on data stream clustering algorithms , 2013, Artificial Intelligence Review.

[16]  Ramadoss Balakrishnan,et al.  Online anomaly detection using non-parametric technique for big data streams in cloud collaborative environment , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[17]  Suhaimi Ibrahim,et al.  Outlier Detection in Stream Data by Clustering Method , 2014 .

[18]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[19]  Cláudia Antunes,et al.  Multi-relational pattern mining over data streams , 2015, Data Mining and Knowledge Discovery.

[20]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[21]  Eamonn J. Keogh,et al.  Data Editing Techniques to Allow the Application of Distance-Based Outlier Detection to Streams , 2010, 2010 IEEE International Conference on Data Mining.

[22]  Marimuthu Palaniswami,et al.  Adaptive Cluster Tendency Visualization and Anomaly Detection for Streaming Data , 2016, ACM Trans. Knowl. Discov. Data.

[23]  Y. Zhang,et al.  – 20 Statistics-based outlier detection for wireless sensor networks , 2012 .

[24]  Hans-Peter Kriegel,et al.  Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles , 2015, DASFAA.

[25]  Dmitry Namiot,et al.  On Big Data Stream Processing , 2015 .

[26]  Le Gruenwald,et al.  Research issues in outlier detection for data streams , 2014, SKDD.

[27]  Jiadong Ren,et al.  Efficient Outlier Detection Algorithm for Heterogeneous Data Streams , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[28]  Zahir Tari,et al.  Dimensionality Reduction for Intrusion Detection Systems in Multi-data Streams—A Review and Proposal of Unsupervised Feature Selection Scheme , 2017 .

[29]  Rui Yao,et al.  Outlier Mining Based on Neighbor-Density-Deviation with Minimum Hyper-Sphere , 2016, Inf. Technol. Control..

[30]  Alfredo Ferro,et al.  Enhancing density-based clustering: Parameter reduction and outlier detection , 2013, Inf. Syst..

[31]  Durga Toshniwal,et al.  A Framework for Outlier Detection in Evolving Data Streams by Weighting Attributes in Clustering , 2012 .

[32]  Hiroyuki Kitagawa,et al.  Detecting Current Outliers: Continuous Outlier Detection over Time-Series Data Streams , 2008, DEXA.

[33]  Parama Bhaumik,et al.  AIDCOR: artificial immunity inspired density based clustering with outlier removal , 2016, International Journal of Machine Learning and Cybernetics.

[34]  Shuchita Upadhyaya,et al.  Outlier Detection: Applications And Techniques , 2012 .

[35]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[36]  Fatos Xhafa,et al.  Processing and Analytics of Big Data Streams with Yahoo!S4 , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications.

[37]  Le Gruenwald,et al.  DBOD-DS: Distance Based Outlier Detection for Data Streams , 2010, DEXA.

[38]  M. Tech Student,et al.  Detection of Outliers in Data Stream Using Clustering Method , 2015 .