Active learning on manifolds

Due to the rapid growth of the size of the digital information available, it is often impossible to label all the samples. Thus, it is crucial to select the most informative samples to label so that the learning performance can be most improved with limited labels. Many active learning algorithms have been proposed for this purpose. Most of these approaches effectively discover the Euclidean structure of the data space, whereas the geometrical (manifold) structure is not well respected. In this paper, we propose a novel active learning algorithm which explicitly considers the case that the data are sampled from a low dimensional sub-manifold embedded in the high dimensional ambient space. The geodesic distance of two data points on the manifold is estimated by the shortest-path distance between the two corresponding vertices in the nearest neighbor graph. By selecting the most representative points with respect to the manifold structure, our approach can effectively decrease the number of training examples the learner needs in order to achieve good performance. Experimental results on visual objects recognition and text categorization have demonstrated the effectiveness of our proposed approach.

[1]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[2]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[3]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Rong Jin,et al.  Semi-supervised SVM batch mode active learning for image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Anthony C. Atkinson,et al.  Optimum Experimental Designs , 1992 .

[7]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[9]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[10]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[11]  Ashish Kapoor,et al.  Active learning for large multi-class problems , 2009, CVPR.

[12]  Mikhail Belkin,et al.  Manifold Regularization : A Geometric Framework for Learning from Examples , 2004 .

[13]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[14]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.

[15]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[16]  Kristen Grauman,et al.  Multi-Level Active Prediction of Useful Image Annotations for Recognition , 2008, NIPS.

[17]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[18]  Deng Cai,et al.  Active subspace learning , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Olivier Chapelle,et al.  Active Learning for Parzen Window Classifier , 2005, AISTATS.

[20]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[21]  Anthony C. Atkinson,et al.  Optimum Experimental Designs, with SAS , 2007 .

[22]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[23]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[24]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[26]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[28]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[29]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[30]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[31]  Hujun Bao,et al.  A unified active and semi-supervised learning framework for image compression , 2009, CVPR.

[32]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[33]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[34]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[35]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, CVPR.

[36]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.