FRIOD: A Deeply Integrated Feature-Rich Interactive System for Effective and Efficient Outlier Detection

In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient.

[1]  Bo Li,et al.  An Improved Grid-Based K-Means Clustering Algorithm , 2014 .

[2]  Bo Sheng,et al.  Outlier detection in sensor networks , 2007, MobiHoc '07.

[3]  Liang Su,et al.  Continuous Adaptive Outlier Detection on Distributed Data Streams , 2007, HPCC.

[4]  Lydia Boudjeloud,et al.  Exploration and Visualization Approach for Outlier Detection on Log Files , 2015, New Trends in Intelligent Information and Database Systems.

[5]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[6]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[7]  Lei Cao,et al.  Interactive Outlier Exploration in Big Data Streams , 2014, Proc. VLDB Endow..

[8]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[9]  Qing Liu,et al.  Detecting Projected Outliers in High-Dimensional Data Streams , 2009, DEXA.

[10]  Jihwan Lee,et al.  Fast Outlier Detection Using a Grid-Based Algorithm , 2016, PloS one.

[11]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[12]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[13]  Kwang-Ho Ro,et al.  Outlier detection for high-dimensional data , 2015 .

[14]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[15]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[16]  Ji Zhang,et al.  Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance , 2006, Knowledge and Information Systems.

[17]  Jim Freeman,et al.  Outliers in Statistical Data (3rd edition) , 1995 .

[18]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[19]  Ji Zhang,et al.  Clustering in Dynamic Spatial Databases , 2005, Journal of Intelligent Information Systems.

[20]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[21]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[22]  Ji Zhang,et al.  Outlier detection from large distributed databases , 2013, World Wide Web.

[23]  Lydia Boudjeloud,et al.  Visual Interactive Evolutionary Algorithm for High Dimensional Data Clustering and Outlier Detection , 2005, PAKDD.

[24]  Haimonti Dutta,et al.  Distributed Top-K Outlier Detection from Astronomy Catalogs using the DEMAC System , 2007, SDM.

[25]  Wojtek Kowalczyk,et al.  An Interactive Approach to Outlier Detection , 2010, RSKT.

[26]  Shizuhiko Nishisato,et al.  Elements of Dual Scaling: An Introduction To Practical Data Analysis , 1993 .

[27]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[28]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[29]  Hiroyuki Kitagawa,et al.  Example-based Outlier Detection with Relevance Feedback , 2004 .

[30]  Mark Crovella,et al.  Distributed Spatial Anomaly Detection , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[31]  Dong Liu,et al.  A Web-based Interactive Data Visualization System for Outlier Subspace Analysis , 2010, SEDE.

[32]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[33]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[34]  Mong-Li Lee,et al.  Correlation-based Attribute Outlier Detection in XML , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[36]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[37]  Xinjie Lv,et al.  distance based outlier for data streams using grid structure , 2009 .

[38]  Ji Zhang,et al.  SODIT: An innovative system for outlier detection using multiple localized thresholding and interactive feedback , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[39]  Lydia Boudjeloud-Assala,et al.  Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems , 2012 .

[40]  Tok Wang Ling,et al.  HOS-Miner: A System for Detecting Outlying Subspaces of High-dimensional Data , 2004, VLDB.