Selecting training points for one-class support vector machines

This paper proposes a training points selection method for one-class support vector machines. It exploits the feature of a trained one-class SVM, which uses points only residing on the exterior region of data distribution as support vectors. Thus, the proposed training set reduction method selects the so-called extreme points which sit on the boundary of data distribution, through local geometry and k-nearest neighbours. Experimental results demonstrate that the proposed method can reduce training set considerably, while the obtained model maintains generalization capability to the level of a model trained on the full training set, but uses less support vectors and exhibits faster training speed.

[1]  Patrick Grother,et al.  Fast implementations of nearest neighbor classifiers , 1997, Pattern Recognit..

[2]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[3]  Defeng Wang,et al.  Selecting valuable training samples for SVMs via data structure analysis , 2008, Neurocomputing.

[4]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[5]  Asma Rabaoui,et al.  Using One-Class SVMs and Wavelets for Audio Surveillance , 2008, IEEE Transactions on Information Forensics and Security.

[6]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[7]  Brian Litt,et al.  One-Class Novelty Detection for Seizure Analysis from Intracranial EEG , 2006, J. Mach. Learn. Res..

[8]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Sungzoon Cho,et al.  Neighborhood PropertyBased Pattern Selection for Support Vector Machines , 2007, Neural Computation.

[11]  Leon N. Cooper,et al.  Training Data Selection for Support Vector Machines , 2005, ICNC.

[12]  Yew-Soon Ong,et al.  Advances in Natural Computation, First International Conference, ICNC 2005, Changsha, China, August 27-29, 2005, Proceedings, Part I , 2005, ICNC.

[13]  Robert D. Nowak,et al.  Learning Minimum Volume Sets , 2005, J. Mach. Learn. Res..

[14]  Yuhua Li,et al.  Causality Challenge: Benchmarking relevant signal components for effective monitoring and process control , 2008, NIPS Causality: Objectives and Assessment.

[15]  Stephan K. Chalup,et al.  Application of SVMs for Colour Classification and Collision Detection with AIBO Robots , 2003, NIPS.

[16]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[17]  Jiangtao Cui,et al.  Efficient nearest neighbor query based on extended B+-tree in high-dimensional space , 2010, Pattern Recognit. Lett..

[18]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[19]  Larry D. Hostetler,et al.  k-nearest-neighbor Bayes-risk estimation , 1975, IEEE Trans. Inf. Theory.

[20]  Shaomin Mu,et al.  High-order Markov kernels for intrusion detection , 2008, Neurocomputing.

[21]  Peter Rousseeuw,et al.  Computing location depth and regression depth in higher dimensions , 1998, Stat. Comput..

[22]  Jiang-She Zhang,et al.  Reducing examples to accelerate support vector regression , 2007, Pattern Recognit. Lett..

[23]  Lionel Tarassenko,et al.  Static and dynamic novelty detection methods for jet engine health monitoring , 2007, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.