An Efficient Density-Based Local Outlier Detection Approach for Scattered Data

After the local outlier factor was first proposed, there is a large family of local outlier detection approaches derived from it. Since the existing approaches only focus on the extent of overall separation between an object and its neighbors, and ignore the degree of dispersion between them, the precision of these approaches will be affected by various degrees in the scattered datasets. In addition, the outlier data occupy a relatively small amount in the dataset, but the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. In this paper, we redefine a local outlier factor called local deviation coefficient (LDC) by taking full advantage of the distribution of the object and its neighbors. And then, we propose a safe non-outlier objects elimination approach named as rough clustering based on multi-level queries (RCMLQ) to preprocess the datasets to eliminate the non-outlier objects to the utmost. Finally, an efficient local outlier detection approach named as efficient density-based local outlier detection for scattered data (E2DLOS) is proposed based on the LDC and RCMLQ. The RCMLQ greatly reduces the amount of data that needs to be quantified for local outlier factor and the LDC is more sensitive to the degree of anomaly of the scattered datasets, and so the E2DLOS improves the existing local outlier detection approaches in time efficiency and detection accuracy. Experiments show that the LDC can better reflect the true abnormal situations of the data for the scattered datasets. And the RCMLQ can be used in parallel with the traditional methods of improving the efficiency of the nearest neighbor search, which can further improve the efficiency of the E2DLOS algorithm by about 16%.

[1]  Raymond T. Ng,et al.  A unified approach for mining outliers , 1997, CASCON.

[2]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[3]  Gaj Vidmar,et al.  Outlier Detection for Healthcare Quality Monitoring – A Comparison of Four Approaches to Over‐Dispersed Proportions , 2014, Qual. Reliab. Eng. Int..

[4]  Pei Zhang,et al.  Research on anomaly detection algorithm based on generalization latency of telecommunication network , 2018, Future Gener. Comput. Syst..

[5]  Sam Yuan Sung,et al.  A trimmed mean approach to finding spatial outliers , 2004, Intell. Data Anal..

[6]  Yannis Manolopoulos,et al.  Continuous monitoring of distance-based outliers over data streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[7]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Elisa Bertino,et al.  Ghostbuster: A Fine-grained Approach for Anomaly Detection in File System Accesses , 2017, CODASPY.

[10]  Hans-Peter Kriegel,et al.  The (black) art of runtime evaluation: Are we comparing algorithms or implementations? , 2017, Knowledge and Information Systems.

[11]  Sajal K. Das,et al.  Detecting breathing frequency and maintaining a proper running rhythm , 2017, Pervasive Mob. Comput..

[12]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[13]  Nenad Stojanovic,et al.  A data-driven approach for multivariate contextualized anomaly detection: Industry use case , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[14]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[15]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[16]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[17]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[18]  Alessandra Gorla,et al.  Detecting Behavior Anomalies in Graphical User Interfaces , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[19]  Deepak K. Agarwal,et al.  An empirical Bayes approach to detect anomalies in dynamic multidimensional arrays , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[20]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[21]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[22]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[23]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[24]  Eamonn J. Keogh,et al.  Finding the most unusual time series subsequence: algorithms and applications , 2006, Knowledge and Information Systems.

[25]  Mark Roantree,et al.  Anomaly detection in agri warehouse construction , 2017, ACSW.

[26]  Malik. Agyemang,et al.  Local sparsity coefficient-based mining of outliers. , 2002 .

[27]  Jugal K. Kalita,et al.  Survey on Incremental Approaches for Network Anomaly Detection , 2011, Int. J. Commun. Networks Inf. Secur..

[28]  Nina F. Thornhill,et al.  Real-Time Detection of Power System Disturbances Based on $k$ -Nearest Neighbor Analysis , 2017, IEEE Access.

[29]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[30]  Hans-Peter Kriegel,et al.  Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles , 2015, DASFAA.

[31]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[32]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[33]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[34]  Mikel Iturbe,et al.  Towards Large-Scale, Heterogeneous Anomaly Detection Systems in Industrial Networks: A Survey of Current Trends , 2017, Secur. Commun. Networks.

[35]  Fan Yang,et al.  A Hybrid Outlier Detection Method for Health Care Big Data , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[36]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[37]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[38]  Jugal K. Kalita,et al.  A Survey of Outlier Detection Methods in Network Anomaly Identification , 2011, Comput. J..

[39]  Sanjay Chawla,et al.  Density-preserving projections for large-scale local anomaly detection , 2012, Knowledge and Information Systems.

[40]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[41]  Jian Tang,et al.  Capabilities of outlier detection schemes in large datasets, framework and methodologies , 2006, Knowledge and Information Systems.

[42]  Limin Xiao,et al.  N2DLOF: A New Local Density-Based Outlier Detection Approach for Scattered Data , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).