Label propagation through minimax paths for scalable semi-supervised learning

Abstract Semi-supervised learning (SSL) is attractive for labeling a large amount of data. Motivated from cluster assumption, we present a path-based SSL framework for efficient large-scale SSL, propagating labels through only a few important paths between labeled nodes and unlabeled nodes. From the framework, minimax paths emerge as a minimal set of important paths in a graph, leading us to a novel algorithm, minimax label propagation. With an appropriate stopping criterion, learning time is (1) linear with respect to the number of nodes in a graph and (2) independent of the number of classes. Experimental results show the superiority of our method over existing SSL methods, especially on large-scale data with many classes.

[1]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[2]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[3]  Xiaojin Zhu,et al.  Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[4]  Ameet Talwalkar,et al.  Sampling Techniques for the Nystrom Method , 2009, AISTATS.

[5]  Antonio Torralba,et al.  Semi-Supervised Learning in Gigantic Image Collections , 2009, NIPS.

[6]  Larry A. Wasserman,et al.  Statistical Analysis of Semi-Supervised Regression , 2007, NIPS.

[7]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[8]  M. Pollack Letter to the Editor—The Maximum Capacity Through a Network , 1960 .

[9]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[10]  James T. Kwok,et al.  Prototype vector machine for large scale semi-supervised learning , 2009, ICML '09.

[11]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[12]  Marco Loog,et al.  Constrained Parameter Estimation for Semi-supervised Learning: The Case of the Nearest Mean Classifier , 2010, ECML/PKDD.

[13]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[14]  Ameet Talwalkar,et al.  Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[16]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[17]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[18]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[19]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[20]  Gholamreza Haffari,et al.  Analysis of Semi-Supervised Learning with the Yarowsky Algorithm , 2007, UAI.

[21]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[22]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[23]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[24]  James B. Orlin,et al.  Max flows in O(nm) time, or better , 2013, STOC '13.

[25]  Ivor W. Tsang,et al.  Large-Scale Sparsified Manifold Regularization , 2006, NIPS.

[26]  Seungjin Choi,et al.  Neighbor search with global geometry: a minimax message passing algorithm , 2007, ICML '07.

[27]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[28]  Sergey V. Buldyrev,et al.  Scale‐Free properties of weighted random graphs: Minimum Spanning Trees and Percolation , 2005 .

[29]  Marc'Aurelio Ranzato,et al.  Learning invariant features through topographic filter maps , 2009, CVPR.

[30]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[31]  M. Griebel,et al.  Semi-supervised learning with sparse grids , 2005, ICML 2005.

[32]  Luh Yen,et al.  A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances , 2008, KDD.

[33]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[34]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[35]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[36]  H. Stanley,et al.  Optimal paths in disordered complex networks. , 2003, Physical review letters.

[37]  François Fouss,et al.  An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification , 2012, Neural Networks.

[38]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.