OFCOD: On the Fly Clustering Based Outlier Detection Framework

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

[1]  Nayyer Masood,et al.  Dengue Fever in Perspective of Clustering Algorithms , 2015, ArXiv.

[2]  Michael Georgiopoulos,et al.  A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes , 2010, Data Mining and Knowledge Discovery.

[3]  Ajay Rana,et al.  K-means with Three different Distance Metrics , 2013 .

[4]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[5]  Jianhua Guo,et al.  Real time traffic flow outlier detection using short-term traffic conditional variance prediction , 2015 .

[6]  Srinivasan Parthasarathy,et al.  Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.

[7]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.

[9]  D. M. I. Devi,et al.  OUTLIER DETECTION ALGORITHM COMBINED WITH DECISION TREE CLASSIFIER FOR EARLY DIAGNOSIS OF BREAST CANCER R , 2016 .

[10]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[11]  P. Santhi,et al.  Improving the Efficiency of Image Clustering using Modified Non Euclidean Distance Measures in Data Mining , 2014, Int. J. Comput. Commun. Control.

[12]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[13]  Yannis Manolopoulos,et al.  Efficient and flexible algorithms for monitoring distance-based outliers over data streams , 2016, Inf. Syst..

[14]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.