An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

Outlier detection, the discovery of observations that deviates from normal behavior, has become crucial in many application domains. Numerous and diverse algorithms have been proposed to detect them. These algorithms identify outliers using precise definitions of the concept of outliers, thus their performance depends largely on the context of application. The construction of ensembles has been proposed as a solution to increase the individual capacity of each algorithm. However, the unsupervised scenario (absence of class labels) in the domains where outlier detection operates restricts the use of approaches relying on the existence of labels. In this paper, two novel unsupervised approaches using ensembles of heterogeneous types of detectors are proposed. Both approaches construct the ensemble using solely the results produced by each algorithm, identifying and giving more weight to the most suitable techniques depending on the particular dataset under examination. Through experimental evaluation in real world datasets, we demonstrate that our proposed algorithm provides a significant improvement over the base algorithms and even over existing approaches for ensemble outlier detection.

[1]  F. Y. Edgeworth,et al.  XLI. On discordant observations , 1887 .

[2]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[3]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[4]  Luís Torgo,et al.  Resource-Bounded Fraud Detection , 2007, EPIA Workshops.

[5]  Kymie M. C. Tan,et al.  The effects of algorithmic diversity on anomaly detector performance , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[6]  Carla E. Brodley,et al.  Anomaly Detection Using an Ensemble of Feature Models , 2010, 2010 IEEE International Conference on Data Mining.

[7]  Anne M. P. Canuto,et al.  Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles , 2007, Pattern Recognit. Lett..

[8]  Martti Juhola,et al.  Informal identification of outliers in medical data , 2000 .

[9]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[10]  Salvatore J. Stolfo,et al.  A Geometric Framework for Unsupervised Anomaly Detection , 2002, Applications of Data Mining in Computer Security.

[11]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[12]  Hans-Peter Kriegel,et al.  On Evaluation of Outlier Rankings and Outlier Scores , 2012, SDM.

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[15]  Vivekanand Gopalkrishnan,et al.  Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces , 2010, DASFAA.

[16]  Arthur Zimek,et al.  Ensembles for unsupervised outlier detection: challenges and research questions a position paper , 2014, SKDD.

[17]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[18]  Vipin Kumar,et al.  CREDOS: Classification Using Ripple Down Structure (A Case for Rare Classes) , 2004, SDM.

[19]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[20]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[21]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[22]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[23]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[24]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[25]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[26]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[27]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[28]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[29]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[30]  M P CanutoAnne,et al.  Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles , 2007 .