An Unsupervised Boosting Strategy for Outlier Detection Ensembles

Ensemble techniques have been applied to the unsupervised outlier detection problem in some scenarios. Challenges are the generation of diverse ensemble members and the combination of individual results into an ensemble. For the latter challenge, some methods tried to design smaller ensembles out of a wealth of possible ensemble members, to improve the diversity and accuracy of the ensemble (relating to the ensemble selection problem in classification). We propose a boosting strategy for combinations showing improvements on benchmark datasets.

[1]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  Dimitrios Gunopulos,et al.  A clustering framework based on subjective and objective validity criteria , 2008, TKDD.

[3]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[4]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[5]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[6]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[7]  Vivekanand Gopalkrishnan,et al.  Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces , 2010, DASFAA.

[8]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[10]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[11]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[12]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[13]  Arthur Zimek,et al.  Subsampling for efficient and effective unsupervised outlier detection ensembles , 2013, KDD.

[14]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[15]  Arthur Zimek,et al.  Data perturbation for outlier detection ensembles , 2014, SSDBM '14.

[16]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[17]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[18]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[19]  Leman Akoglu,et al.  Sequential Ensemble Learning for Outlier Detection: A Bias-Variance Perspective , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[20]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[21]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[22]  Grigorios Tsoumakas,et al.  An Ensemble Pruning Primer , 2009, Applications of Supervised and Unsupervised Ensemble Methods.

[23]  Hans-Peter Kriegel,et al.  On Evaluation of Outlier Rankings and Outlier Scores , 2012, SDM.

[24]  Tossapon Boongoen,et al.  Comparative study of matrix refinement approaches for ensemble clustering , 2013, Machine Learning.

[25]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[26]  Arthur Zimek,et al.  Ensembles for unsupervised outlier detection: challenges and research questions a position paper , 2014, SKDD.

[27]  Pasi Fränti,et al.  Outlier detection using k-nearest neighbour graph , 2004, ICPR 2004.

[28]  Mahsa Salehi,et al.  Smart Sampling: A Novel Unsupervised Boosting Approach for Outlier Detection , 2016, Australasian Conference on Artificial Intelligence.

[29]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[30]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[31]  J. Skilling Bayesian Methods in Cosmology: Foundations and algorithms , 2009 .

[32]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[33]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[34]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[35]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[36]  Jing Gao,et al.  Converting Output Scores from Outlier Detection Algorithms into Probability Estimates , 2006, Sixth International Conference on Data Mining (ICDM'06).

[37]  Leman Akoglu,et al.  Less is More , 2016, ACM Trans. Knowl. Discov. Data.

[38]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[39]  Arthur Zimek,et al.  Good and Bad Neighborhood Approximations for Outlier Detection Ensembles , 2017, SISAP.

[40]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[41]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[42]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.