Covariance-guided One-Class Support Vector Machine

In one-class classification, the low variance directions in the training data carry crucial information to build a good model of the target class. Boundary-based methods like One-Class Support Vector Machine (OSVM) preferentially separates the data from outliers along the large variance directions. On the other hand, retaining only the low variance directions can result in sacrificing some initial properties of the original data and is not desirable, specially in case of limited training samples. This paper introduces a Covariance-guided One-Class Support Vector Machine (COSVM) classification method which emphasizes the low variance projectional directions of the training data without compromising any important characteristics. COSVM improves upon the OSVM method by controlling the direction of the separating hyperplane through incorporation of the estimated covariance matrix from the training data. Our proposed method is a convex optimization problem resulting in one global optimum solution which can be solved efficiently with the help of existing numerical methods. The method also keeps the principal structure of the OSVM method intact, and can be implemented easily with the existing OSVM libraries. Comparative experimental results with contemporary one-class classifiers on numerous artificial and benchmark datasets demonstrate that our method results in significantly better classification performance. HighlightsThe low-variance directions are crucial for one-class classification (OCC).A new method of OCC emphasizing the low-variance directions is proposed.The method incorporates covariance information into convex optimization problem.Can be implemented and solved efficiently with existing software.Comparative experiments with contemporary classifiers show positive results.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[3]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Tony Jebara,et al.  Maximum Relative Margin and Data-Dependent Regularization , 2010, J. Mach. Learn. Res..

[6]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[7]  M. M. Moya,et al.  One-class classifier networks for target recognition applications , 1993 .

[8]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[9]  Nathalie Japkowicz,et al.  Clustering Based One-Class Classification for Compliance Verification of the Comprehensive Nuclear-Test-Ban Treaty , 2012, Canadian Conference on AI.

[10]  Adriano Lorena Inácio de Oliveira,et al.  A novel one-class classification method based on feature analysis and prototype reduction , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[11]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[12]  Jiawei Han,et al.  PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[16]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[17]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[18]  Ivor W. Tsang,et al.  Learning the Kernel in Mahalanobis One-Class Support Vector Machines , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[19]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Yoshihiko Hamamoto,et al.  Improvement of the Parzen classifier in small training sample size situations , 2001, Intell. Data Anal..

[22]  Klaus-Robert Müller,et al.  From outliers to prototypes: Ordering data , 2006, Neurocomputing.

[23]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[24]  J. B. Rosen Pattern separation by convex programming , 1965 .

[25]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[26]  Lucas C. Parra,et al.  Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps , 1996, Neural Computation.

[27]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[28]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[29]  MüllerKlaus-Robert,et al.  From outliers to prototypes , 2006 .

[30]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[31]  Yanmin Niu,et al.  New one-versus-all v-SVM solving intra-inter class imbalance with extended manifold regularization and localized relative maximum margin , 2013, Neurocomputing.

[32]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[33]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[34]  T.Y. Lin,et al.  Anomaly detection , 1994, Proceedings New Security Paradigms Workshop.

[35]  Piotr Juszczak Learning to recognise : a study on one-class classification and active learning , 2006 .

[36]  Nathan Srebro,et al.  Beating SGD: Learning SVMs in Sublinear Time , 2011, NIPS.

[37]  Bernhard Schölkopf,et al.  Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra , 2000, NIPS.

[38]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[39]  Zhi-Hua Zhou,et al.  Editing Training Data for kNN Classifiers with Neural Network Ensemble , 2004, ISNN.

[40]  Klaus-Robert Müller,et al.  Feature Extraction for One-Class Classification , 2003, ICANN.

[41]  Guizhi Xu,et al.  Tumor Detection in MR Images Using One-Class Immune Feature Weighted SVMs , 2011, IEEE Transactions on Magnetics.

[42]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[43]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[44]  Yi-Ming Chen,et al.  Combining incremental Hidden Markov Model and Adaboost algorithm for anomaly intrusion detection , 2009, CSI-KDD '09.

[45]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[46]  Ying Tan,et al.  Discriminant analysis via support vectors , 2010, Neurocomputing.

[47]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[48]  Xindong Wu,et al.  NESVM: A Fast Gradient Method for Support Vector Machines , 2010, 2010 IEEE International Conference on Data Mining.

[49]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[50]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .