Asymptotic behavior of ℓp-based Laplacian regularization in semi-supervised learning

Given a weighted graph with N vertices, consider a real-valued regression problem in a semi-supervised setting, where one observes n labeled vertices, and the task is to label the remaining ones. We present a theoretical study of lp-based Laplacian regularization under a d-dimensional geometric random graph model. We provide a variational characterization of the performance of this regularized learner as N grows to infinity while n stays constant; the associated optimality conditions lead to a partial differential equation that must be satisfied by the associated function estimate f̂ . From this formulation we derive several predictions on the limiting behavior the d-dimensional function f̂ , including (a) a phase transition in its smoothness at the threshold p = d + 1; and (b) a tradeoff between smoothness and sensitivity to the underlying unlabeled data distribution P . Thus, over the range p ≤ d, the function estimate f̂ is degenerate and “spiky,” whereas for p ≥ d + 1, the function estimate f̂ is smooth. We show that the effect of the underlying density vanishes monotonically with p, such that in the limit p = ∞, corresponding to the so-called Absolutely Minimal Lipschitz Extension, the estimate f̂ is independent of the distribution P . Under the assumption of semi-supervised smoothness, ignoring P can lead to poor statistical performance; in particular, we construct a specific example for d = 1 to demonstrate that p = 2 has lower risk than p = ∞ due to the former penalty adapting to P and the latter ignoring it. We also provide simulations that verify the accuracy of our predictions for finite sample sizes. Together, these properties show that p = d + 1 is an optimal choice, yielding a function estimate f̂ that is both smooth and non-degenerate, while remaining maximally sensitive to P .

[1]  S. Brendle,et al.  Calculus of Variations , 1927, Nature.

[2]  R. Darst,et al.  The Polya algorithm in L∞ approximation , 1983 .

[3]  D. Legg,et al.  The Polya algorithm for convex approximation , 1989 .

[4]  A. Egger,et al.  Rate of Convergence of the Discrete Pólya-1 Algorithm , 1990 .

[5]  R. Brown,et al.  Embeddings of weighted Sobolev spaces into spaces of continuous functions , 1992, Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences.

[6]  J. Heinonen,et al.  Nonlinear Potential Theory of Degenerate Elliptic Equations , 1993 .

[7]  倪仁兴 关于《Dependence on P of the Best L~P Approximation Operator》一文的注记 , 1994 .

[8]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[9]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[10]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[11]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[12]  L. Evans,et al.  Optimal Lipschitz extensions and the infinity laplacian , 2001 .

[13]  Shahar Mendelson,et al.  Geometric Parameters of Kernel Machines , 2002, COLT.

[14]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[15]  Saïd Amghibech,et al.  Eigenvalues of the Discrete p-Laplacian for Graphs , 2003, Ars Comb..

[16]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[17]  Matthias Hein,et al.  Measure Based Regularization , 2003, NIPS.

[18]  M. Crandall,et al.  A TOUR OF THE THEORY OF ABSOLUTELY MINIMIZING FUNCTIONS , 2004 .

[19]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[20]  Bernhard Schölkopf,et al.  Regularization on Discrete Spaces , 2005, DAGM-Symposium.

[21]  Matthias Hein,et al.  Geometrical aspects of statistical learning theory , 2005 .

[22]  Y. Peres,et al.  Tug-of-war and the infinity Laplacian , 2006, math/0605002.

[23]  Matthias Hein,et al.  Uniform Convergence of Adaptive Graph-Based Regularization , 2006, COLT.

[24]  Y. Peres,et al.  Tug-of-war with noise: A game-theoretic view of the $p$-Laplacian , 2006, math/0607761.

[25]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[26]  Tong Zhang,et al.  Learning on Graph with Laplacian Regularization , 2006, NIPS.

[27]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[28]  Tong Zhang,et al.  On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning , 2007, J. Mach. Learn. Res..

[29]  Larry A. Wasserman,et al.  Statistical Analysis of Semi-Supervised Regression , 2007, NIPS.

[30]  Tong Zhang,et al.  Graph-Based Semi-Supervised Learning and Spectral Kernel Design , 2008, IEEE Transactions on Information Theory.

[31]  Matthias Hein,et al.  Spectral clustering based on the graph p-Laplacian , 2009, ICML '09.

[32]  G. Leoni A First Course in Sobolev Spaces , 2009 .

[33]  B. Nadler,et al.  Semi-supervised learning with the graph Laplacian: the limit of infinite unlabelled data , 2009, NIPS 2009.

[34]  Feiping Nie,et al.  On the eigenvectors of p-Laplacian , 2010, Machine Learning.

[35]  Charles K. Smart,et al.  An easy proof of Jensen’s theorem on the uniqueness of infinity harmonic functions , 2009, 0906.3325.

[36]  Mikhail Belkin,et al.  Semi-supervised Learning by Higher Order Regularization , 2011, AISTATS.

[37]  Ulrike von Luxburg,et al.  Phase transition in the family of p-resistances , 2011, NIPS.

[38]  Adam M. Oberman Finite difference methods for the Infinity Laplace and p-Laplace equations , 2011, 1107.5278.

[39]  Yibiao Lu Statistical methods with application to machine learning and artificial intelligence , 2012 .

[40]  Xiaojin Zhu,et al.  p-voltages: Laplacian Regularization for Semi-Supervised Learning on High-Dimensional Data , 2013 .

[41]  Chiranjib Bhattacharyya,et al.  Learning on graphs using Orthonormal Representation is Statistically Consistent , 2014, NIPS.

[42]  Francis R. Bach,et al.  Spectral Norm Regularization of Orthonormal Representations for Graph Transduction , 2015, NIPS.

[43]  Daniel A. Spielman,et al.  Algorithms for Lipschitz Learning on Graphs , 2015, COLT.

[44]  Alexander J. Smola,et al.  Trend Filtering on Graphs , 2014, J. Mach. Learn. Res..