Label Propagation and Quadratic Criterion

Various graph-based algorithms for semi-supervised learning have been proposed in the recent literature. They rely on the idea of building a graph whose nodes are data points (labeled and unlabeled) and edges represent similarities between points. Known labels are used to propagate information through the graph in order to label all nodes. In this chapter, we show how these different algorithms can be cast into a common framework where one minimizes a quadratic cost criterion whose closed-form solution is found by solving a linear system of size n (total number of data points). The cost criterion naturally leads to an extension of such algorithms to the inductive setting, where one obtains test samples one at a time: the derived induction formula can be evaluated in O(n) time, which is much more efficient than solving again exactly the linear system (which in general costs O(kn2) time for a sparse graph where each data point has k neighbors). We also use this inductive formula to show that when the similarity between points satisfies a locality property, then the algorithms are plagued by the curse of dimensionality, with respect to the dimensionality of an underlying manifold.

[1]  E. Nadaraya On Estimating Regression , 1964 .

[2]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[3]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[4]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[5]  S. Rosenberg The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .

[6]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[7]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[8]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[9]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[10]  Stanley C. Fralick,et al.  Learning to recognize patterns without a teacher , 1967, IEEE Trans. Inf. Theory.

[11]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[12]  Matthias Hein,et al.  Measure Based Regularization , 2003, NIPS.

[13]  Y. Abu-Mostafa Machines that Learn from Hints , 1995 .

[14]  Yoshua Bengio,et al.  Greedy Spectral Embedding , 2005, AISTATS.

[15]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[16]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[17]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[18]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[19]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[20]  Yoshua Bengio,et al.  The Curse of Dimensionality for Local Kernel Machines , 2005 .

[21]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[22]  Joachim M. Buhmann,et al.  Clustering with the Connectivity Kernel , 2003, NIPS.

[23]  ASHOK K. AGRAWALA,et al.  Learning with a probabilistic teacher , 1970, IEEE Trans. Inf. Theory.

[24]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[25]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[26]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[27]  Yoshua Bengio,et al.  Non-Local Manifold Tangent Learning , 2004, NIPS.