Replica theory for learning curves for Gaussian processes on random graphs

We use a statistical physics approach to derive accurate predictions for the challenging problem of predicting the performance of Gaussian process regression. Performance is quantified by the learning curve, defined as the average error versus number of training examples. We assume the Gaussian process prior is defined by a random walk kernel, inputs are vertices on a random graph and the outputs are noisy function values. We show that replica techniques can be used to obtain exact performance predictions in the limit of large graphs, after first rewriting the average error in terms of a graphical model. Conventionally, the Gaussian process kernel is only globally normalized, so that the prior variance can differ between vertices. As a more principled alternative we also consider local normalization, where the prior variance is uniform. The normalization constants for the prior then have to be defined as thermal averages in an unnormalized model and this requires the introduction of a second, auxiliary set of replicas. Our results for both types of kernel normalization apply generically to all random graph ensembles constrained by a fixed but arbitrary degree distribution. We compare with numerically simulated learning curves and find excellent agreement, a significant improvement over existing approximations.

[1]  Peter Sollich,et al.  Random walk kernels and learning curves for Gaussian process regression on random graphs , 2012, J. Mach. Learn. Res..

[2]  Francesco Alessandro Massucci,et al.  A weighted belief-propagation algorithm for estimating volume-related properties of random polytopes , 2012, 1208.1295.

[3]  Francesco Alessandro Massucci,et al.  A weighted message-passing algorithm to estimate volume-related properties of random polytopes , 2011, ArXiv.

[4]  Peter Sollich,et al.  Exact learning curves for Gaussian process regression on large random graphs , 2010, NIPS.

[5]  R. Kuehn,et al.  Spectra of modular and small-world matrices , 2010, ArXiv.

[6]  Zoubin Ghahramani,et al.  Gene function prediction from synthetic lethality networks via ranking on demand , 2010, Bioinform..

[7]  Koujin Takeda,et al.  Spectral density of random graphs with topological constraints , 2009, 0910.3556.

[8]  Yixin Chen,et al.  Graph ranking for exploratory gene data analysis , 2009, BMC Bioinformatics.

[9]  Zhaolei Zhang,et al.  Learning Random-Walk Kernels for Protein Remote Homology Identification and Motif Discovery , 2009, SDM.

[10]  A. Martin-Löf,et al.  Generating Simple Random Graphs with Prescribed Degree Distribution , 2006, 1509.06985.

[11]  Ilkka Norros,et al.  On a conditionally Poissonian graph process , 2006, Advances in Applied Probability.

[12]  Andrzej Rucinski,et al.  Random graphs , 2006, SODA.

[13]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[14]  Dörthe Malzahn,et al.  A statistical physics approach for the analysis of machine learning algorithms on real data , 2005 .

[15]  Manfred Opper,et al.  Learning curves and bootstrap estimates for inference with Gaussian processes: A statistical mechanics study , 2003, Complex..

[16]  Peter Sollich,et al.  Learning Curves for Gaussian Process Regression: Approximations and Bounds , 2001, Neural Computation.

[17]  Manfred Opper,et al.  A Variational Approach to Learning Curves , 2001, NIPS.

[18]  Peter Sollich Gaussian Process Regression with Mismatched Models , 2001, NIPS.

[19]  M. Mézard,et al.  The Bethe lattice spin glass revisited , 2000, cond-mat/0009418.

[20]  Christopher K. I. Williams,et al.  Upper and Lower Bounds on the Learning Curve for Gaussian Processes , 2000, Machine Learning.

[21]  Peter Sollich,et al.  Learning Curves for Gaussian Processes , 1998, NIPS.

[22]  Manfred Opper,et al.  General Bounds on Bayes Errors for Regression with Gaussian Processes , 1998, NIPS.

[23]  D. Saad,et al.  Dynamics of on-line learning in radial basis function networks , 1997 .

[24]  Opper,et al.  Bounds for predictive errors in the statistical mechanics of supervised learning. , 1995, Physical review letters.

[25]  Gerhard Müller,et al.  The recursion method : application to many-body dynamics , 1994 .

[26]  D. Haussler,et al.  Rigorous learning curve bounds from statistical mechanics , 1994, COLT '94.

[27]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[28]  Shun-ichi Amari,et al.  Four Types of Learning Curves , 1992, Neural Computation.

[29]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[30]  William W. Hager,et al.  Updating the Inverse of a Matrix , 1989, SIAM Rev..

[31]  J. van Mourik,et al.  Amorphous , 2021, Encyclopedic Dictionary of Archaeology.

[32]  Tom Michael Mitchell Learning Theory: Why ML Works , 2020 .

[33]  Peter Sollich,et al.  Learning curves for Gaussian process regression on random graphs , 2013 .

[34]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[35]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[36]  Peter Sollich Approximate learning curves for Gaussian processes , 1999 .

[37]  F. Chung Spectral Graph Theory, Regional Conference Series in Math. , 1997 .