论文信息 - Converting SVDD scores into probability estimates: Application to outlier detection

Converting SVDD scores into probability estimates: Application to outlier detection

To enable post-processing, the output of a support vector data description (SVDD) should be transformed into a calibrated probability, as it can be done for SVM. But standard SVDD only estimate a single level set and do not provide such probabilities. We present a method for estimating these probabilities from SVDD scores. The first step of our approach uses a generalization of the SVDD model that estimate simultaneously various coherent level sets. Then we introduce two calibration mechanisms for converting these level sets into probabilities. A synthetic dataset and datasets from the UCI repository are used to compare the performance of our method against a robust kernel density estimator in an outlier detection task, illustrating the interest of our approach.

Carole Lartizien | Stéphane Canu | Meriem El Azami | S. Canu | C. Lartizien

[1] Jean-Philippe Vert,et al. Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[2] Clayton D. Scott,et al. Robust kernel density estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] R. Fisher,et al. Limiting forms of the frequency distribution of the largest or smallest member of a sample , 1928, Mathematical Proceedings of the Cambridge Philosophical Society.

[4] José Ragot,et al. Multi-task learning with one-class SVM , 2014, Neurocomputing.

[5] Hans-Peter Kriegel,et al. LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[6] F. Hampel. The Influence Curve and Its Role in Robust Estimation , 1974 .

[7] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[8] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[9] Robert P. W. Duin,et al. Support Vector Data Description , 2004, Machine Learning.

[10] Rasmus Larsen,et al. The Entire Regularization Path for the Support Vector Domain Description , 2006, MICCAI.

[11] Felix Naumann,et al. Data fusion , 2009, CSUR.

[12] Bernhard Schölkopf,et al. Support Vector Machines as Probabilistic Models , 2011, ICML.

[13] Albert Thomas,et al. Calibration of One-Class SVM for MV set estimation , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[14] Gyemin Lee,et al. Nested support vector machines , 2010, IEEE Trans. Signal Process..

[15] Natalia Markovich. Nonparametric estimation of a heavy-tailed density , 2011 .

[16] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[17] Jing Gao,et al. Converting Output Scores from Outlier Detection Algorithms into Probability Estimates , 2006, Sixth International Conference on Data Mining (ICDM'06).

[18] P. J. Huber. Robust Estimation of a Location Parameter , 1964 .

[19] Hans-Peter Kriegel,et al. Interpreting and Unifying Outlier Scores , 2011, SDM.

[20] Vivekanand Gopalkrishnan,et al. Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces , 2010, DASFAA.

[21] David A. Clifton,et al. A review of novelty detection , 2014, Signal Process..

[22] J. Pickands. Statistical Inference Using Extreme Order Statistics , 1975 .

[23] Michael Lindenbaum,et al. Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data , 2012, NIPS.

[24] Anthony K. H. Tung,et al. Mining top-n local outliers in large databases , 2001, KDD '01.

[25] Robert D. Nowak,et al. Learning Minimum Volume Sets , 2005, J. Mach. Learn. Res..

[26] Michael R. Lyu,et al. Multi-task Learning for one-class classification , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[27] T.Y. Lin,et al. Anomaly detection , 1994, Proceedings New Security Paradigms Workshop.

[28] B. Silverman. Density estimation for statistics and data analysis , 1986 .

[29] Michael Lindenbaum,et al. q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions , 2013, NIPS.