Feature selection in the Laplacian support vector machine

Traditional classifiers including support vector machines use only labeled data in training. However, labeled instances are often difficult, costly, or time consuming to obtain while unlabeled instances are relatively easy to collect. The goal of semi-supervised learning is to improve the classification accuracy by using unlabeled data together with a few labeled data in training classifiers. Recently, the Laplacian support vector machine has been proposed as an extension of the support vector machine to semi-supervised learning. The Laplacian support vector machine has drawbacks in its interpretability as the support vector machine has. Also it performs poorly when there are many non-informative features in the training data because the final classifier is expressed as a linear combination of informative as well as non-informative features. We introduce a variant of the Laplacian support vector machine that is capable of feature selection based on functional analysis of variance decomposition. Through synthetic and benchmark data analysis, we illustrate that our method can be a useful tool in semi-supervised learning.

[1]  Hui Zou An Improved 1-norm SVM for Simultaneous Classification and Variable Selection , 2007, AISTATS.

[2]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[3]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[4]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[5]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[6]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[7]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[8]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[9]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[10]  Junhui Wang,et al.  Large Margin Semi-supervised Learning , 2007, J. Mach. Learn. Res..

[11]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[12]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[15]  Mikhail Belkin,et al.  CONSISTENCY OF SPECTRAL CLUSTERING BY ULRIKE , 2004 .

[16]  Yoonkyung Lee,et al.  Structured multicategory support vector machines with analysis of variance decomposition , 2006 .

[17]  Wei Pan,et al.  On Efficient Large Margin Semisupervised Learning: Method and Theory , 2009, J. Mach. Learn. Res..

[18]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[19]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[20]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[21]  Hao Helen Zhang Variable selection for support vector machines via smoothing spline anova , 2006 .

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .