Subspace Regularization: A New Semi-supervised Learning Method

Most existing semi-supervised learning methods are based on the smoothness assumption that data points in the same high density region should have the same label. This assumption, though works well in many cases, has some limitations. To overcome this problems, we introduce into semi-supervised learning the classic low-dimensionality embedding assumption, stating that most geometric information of high dimensional data is embedded in a low dimensional manifold. Based on this, we formulate the problem of semi-supervised learning as a task of finding a subspace and a decision function on the subspace such that the projected data are well separated and the original geometric information is preserved as much as possible. Under this framework, the optimal subspace and decision function are iteratively found via a projection pursuit procedure. The low computational complexity of the proposed method lends it to applications on large scale data sets. Experimental comparison with some previous semi-supervised learning methods demonstrates the effectiveness of our method.

[1]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[2]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[3]  Christopher J. C. Burges,et al.  Geometric Methods for Feature Extraction and Dimensional Reduction - A Guided Tour , 2005, Data Mining and Knowledge Discovery Handbook.

[4]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[5]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[7]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[8]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[9]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[10]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Tommi S. Jaakkola,et al.  Information Regularization with Partially Labeled Data , 2002, NIPS.

[13]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[14]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[15]  Neil D. Lawrence,et al.  Semi-supervised Learning via Gaussian Processes , 2004, NIPS.

[16]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[17]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[18]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[19]  Christopher J. C. Burges,et al.  Geometric Methods for Feature Extraction and Dimensional Reduction , 2005 .

[20]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[21]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[22]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.