Manifold Coarse Graining for Online Semi-supervised Learning

When the number of labeled data is not sufficient, Semi-Supervised Learning (SSL) methods utilize unlabeled data to enhance classification. Recently, many SSL methods have been developed based on the manifold assumption in a batch mode. However, when data arrive sequentially and in large quantities, both computation and storage limitations become a bottleneck. In this paper, we present a new semisupervised coarse graining (CG) algorithm to reduce the required number of data points for preserving the manifold structure. First, an equivalent formulation of Label Propagation (LP) is derived. Then a novel spectral view of the Harmonic Solution (HS) is proposed. Finally an algorithm to reduce the number of data points while preserving the manifold structure is provided and a theoretical analysis on preservation of the LP properties is presented. Experimental results on real world datasets show that the proposed method outperforms the state of the art coarse graining algorithm in different settings.

[1]  William H. Press,et al.  Numerical recipes , 1990 .

[2]  Ling Huang,et al.  Online Semi-Supervised Learning on Quantized Graphs , 2010, UAI.

[3]  Ming Li,et al.  Online Manifold Regularization: A New Learning Setting and Empirical Study , 2008, ECML/PKDD.

[4]  William H. Press,et al.  Numerical recipes: the art of scientific computing, 3rd Edition , 2007 .

[5]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[6]  Xiaofei He Incremental semi-supervised subspace learning for image retrieval , 2004, MULTIMEDIA '04.

[7]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[8]  David Gfeller,et al.  Spectral coarse graining of complex networks. , 2007, Physical review letters.

[9]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[10]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[11]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[12]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[13]  Jean Ponce,et al.  Segmentation by transduction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[15]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[16]  Mikhail Belkin,et al.  Using manifold structure for partially labelled classification , 2002, NIPS 2002.

[17]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[18]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[19]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[20]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[21]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[22]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[23]  Mikhail Belkin,et al.  Using Manifold Stucture for Partially Labeled Classification , 2002, NIPS.

[24]  Joachim M. Buhmann,et al.  Manifold regularization for semi-supervised sequential learning , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[26]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.