One-Class Active Learning for Outlier Detection with Multiple Subspaces

Active learning for outlier detection involves users in the process, by asking them for annotations of observations, in the form of class labels. The usual assumption is that users can provide such feedback, regardless of the nature and the presentation of the results. This is a simplification, which may not hold in practice. To overcome it, we propose SubSVDD, a semi-supervised classifier, that learns decision boundaries in low-dimensional projections of the data. SubSVDD de-constructs the outlier classification so that users can comprehend and interpret results more easily. For active learning, SubSVDD features a new update mechanism that adjusts decision boundaries based on user feedback. In particular, it considers that outliers may only occur in some of the low-dimensional projections. We conduct systematic experiments to show the effectiveness of our approach. In a comprehensive benchmark, SubSVDD outperforms alternative approaches on several data sets.

[1]  Ira Assent,et al.  Explaining Outliers by Subspace Separability , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  Klemens Böhm,et al.  Validating one-class active learning with user studies – A prototype and open challenges , 2019 .

[3]  Francis Bach,et al.  End-to-End Active Learning for Computer Security Experts , 2018, AAAI Workshops.

[4]  Hamid R. Rabiee,et al.  Active Learning from Positive and Unlabeled Data , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[5]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[6]  David M. J. Tax,et al.  Pruned Random Subspace Method for One-Class Classifiers , 2011, MCS.

[7]  Yun Fu,et al.  Low-Rank and Sparse Modeling for Visual Analysis , 2014, Springer International Publishing.

[8]  Vincent Vercruyssen,et al.  Semi-Supervised Anomaly Detection with an Application to Water Analytics , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[9]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[10]  Francis Bach,et al.  ILAB: An Interactive Labelling Strategy for Intrusion Detection , 2017, RAID.

[11]  Francisco Herrera,et al.  Instance reduction for one-class classification , 2018, Knowledge and Information Systems.

[12]  Robert Pless,et al.  Anomaly Explanation Using Metadata , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Nathalie Japkowicz,et al.  Active Learning for One-Class Classification , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[14]  Heiko Paulheim,et al.  A decomposition of the outlier detection problem into a set of supervised learning problems , 2015, Machine Learning.

[15]  Qiang Liu,et al.  Hyperparameter selection of one-class support vector machine by self-adaptive data shifting , 2018, Pattern Recognit..

[16]  Ribana Roscher,et al.  Can I Trust My One-Class Classification? , 2014, Remote. Sens..

[17]  Alexandros Iosifidis,et al.  Subspace Support Vector Data Description , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[18]  Alvaro Soto,et al.  Active learning and subspace clustering for anomaly detection , 2011, Intell. Data Anal..

[19]  Christos Faloutsos,et al.  Beyond Outlier Detection: LookOut for Pictorial Explanation , 2018, ECML/PKDD.

[20]  Thomas G. Dietterich,et al.  Incorporating Expert Feedback into Active Anomaly Discovery , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[21]  Bartosz Krawczyk,et al.  Combining Diverse One-Class Classifiers , 2012, HAIS.

[22]  High-Dimensional Outlier Detection: The Subspace Method , 2013 .

[23]  Hamid R. Rabiee,et al.  ACTIVE ONE-CLASS LEARNING BY KERNEL DENSITY ESTIMATION , 2011 .

[24]  Jun-Geol Baek,et al.  Density weighted support vector data description , 2014, Expert Syst. Appl..

[25]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[26]  Ming Shao,et al.  Low-Rank Outlier Detection , 2014, Low-Rank and Sparse Modeling for Visual Analysis.

[27]  Klemens Böhm,et al.  An overview and a benchmark of active learning for outlier detection with one-class classifiers , 2018, Expert Syst. Appl..

[28]  Ian Davidson,et al.  A Framework for Outlier Description Using Constraint Programming , 2016, AAAI.

[29]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[30]  Leman Akoglu,et al.  Explaining anomalies in groups with characterizing subspace rules , 2017, Data Mining and Knowledge Discovery.

[31]  Jianling Qu,et al.  Heuristic sample reduction method for support vector data description , 2016 .

[32]  Trung Le,et al.  A Theoretical Framework for Multi-sphere Support Vector Data Description , 2010, ICONIP.

[33]  Nathalie Japkowicz,et al.  Improving Active Learning for One-Class Classification Using Dimensionality Reduction , 2017, Canadian Conference on AI.

[34]  Klemens Böhm,et al.  Dimension-based subspace search for outlier detection , 2018, International Journal of Data Science and Analytics.

[35]  Piotr Juszczak Learning to recognise : a study on one-class classification and active learning , 2006 .

[36]  Yuhua Li,et al.  Selecting training points for one-class support vector machines , 2011, Pattern Recognit. Lett..

[37]  Donghwa Shin,et al.  Contextual Outlier Interpretation , 2017, IJCAI.

[38]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[39]  Marius Kloft,et al.  Toward Supervised Anomaly Detection , 2014, J. Artif. Intell. Res..

[40]  Kristian Kersting,et al.  "Why Should I Trust Interactive Learners?" Explaining Interactive Queries of Classifiers to Users , 2018, ArXiv.

[41]  Yong Zhang,et al.  Fault classifier of rotating machinery based on weighted support vector data description , 2009, Expert Syst. Appl..

[42]  Richard L. Phillips,et al.  Interpretable Active Learning , 2018, FAT.

[43]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[44]  Chih-Jen Lin,et al.  A Revisit to Support Vector Data Description , 2015 .

[45]  Thomas G. Dietterich,et al.  Feedback-Guided Anomaly Discovery via Online Optimization , 2018, KDD.

[46]  James Bailey,et al.  Discovering outlying aspects in large datasets , 2016, Data Mining and Knowledge Discovery.

[47]  Arthur Zimek,et al.  Discriminative features for identifying and interpreting outliers , 2014, 2014 IEEE 30th International Conference on Data Engineering.