Class-Based Outlier Detection: Staying Zombies or Awaiting for Resurrection?

This paper addresses the task of finding outliers within each class in the context of supervised classification problems. Class-based outliers are cases that deviate too much with respect to the cases of the same class. We introduce a novel method for outlier detection in labelled data based on Random Forests and compare it with existing methods both on artificial and real-world data. We show that it is competitive with the existing methods and sometimes gives more intuitive results. We also provide an overview for outlier detection in labelled data. The main contribution are two methods for class-based outlier description and interpretation.

[1]  Christos Faloutsos,et al.  Cross-Outlier Detection , 2003, SSTD.

[2]  Ira Assent,et al.  Explaining Outliers by Subspace Separability , 2013, 2013 IEEE 13th International Conference on Data Mining.

[3]  Tony R. Martinez,et al.  Improving classification accuracy by identifying and removing instances that should be misclassified , 2011, The 2011 International Joint Conference on Neural Networks.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Ira Assent,et al.  Local Outlier Detection with Interpretation , 2013, ECML/PKDD.

[6]  Zengyou He,et al.  Outlier Detection Integrating Semantic Knowledge , 2002, WAIM.

[7]  Fabrizio Angiulli,et al.  Exploiting domain knowledge to detect outliers , 2013, Data Mining and Knowledge Discovery.

[8]  Lubos Popelínský,et al.  Educational Data Mining for Analysis of Students' Solutions , 2014, AIMSA.

[9]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[11]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[12]  Klemens Böhm,et al.  OutRules: A Framework for Outlier Descriptions in Multiple Context Spaces , 2012, ECML/PKDD.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Arno Knobbe,et al.  Exceptional Model Mining , 2008, ECML/PKDD.

[15]  Wouter Duivesteijn,et al.  Discovering Local Subgroups, with an Application to Fraud Detection , 2013, PAKDD.

[16]  Zengyou He,et al.  Mining class outliers: concepts, algorithms and applications in CRM , 2004, Expert Syst. Appl..

[17]  Nabil M. Hewahi,et al.  Class Outliers Mining: Distance-Based Approach , 2007 .