Automatic model construction with Gaussian processes

This work was supported by the National Sciences and Engineering Research Council of Canada, the Cambridge Commonwealth Trust, Pembroke College, a grant from the Engineering and Physical Sciences Research Council, and a grant from Google.

[1]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2]  R Bellman,et al.  DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[4]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[5]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[6]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  I. G. MacDonald,et al.  Symmetric functions and Hall polynomials , 1979 .

[9]  Temple F. Smith Occam's razor , 1980, Nature.

[10]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[11]  E. K. Bowen,et al.  Basic Statistics for Business and Economics , 1982 .

[12]  John H. R. Maunsell,et al.  The connections of the middle temporal visual area (MT) and their relationship to a cortical hierarchy in the macaque monkey , 1983, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[13]  M. Degroot,et al.  Highly Informative Priors , 1985 .

[14]  M. A. Armstrong Groups and symmetry , 1988 .

[15]  G. Wahba Spline models for observational data , 1990 .

[16]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[17]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[18]  J. Lean,et al.  Reconstruction of solar irradiance since 1610: Implications for climate change , 1995 .

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  David A. McAllester,et al.  Effective Bayesian Inference for Stochastic Programs , 1997, AAAI/IAAI.

[21]  Ljup Co Todorovski Declarative Bias in Equation Discovery , 1997 .

[22]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[23]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[24]  D. B. Graham,et al.  Characterising Virtual Eigensignatures for General Purpose Face Recognition , 1998 .

[25]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[26]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[27]  H. Rabitz,et al.  General foundations of high‐dimensional model representations , 1999 .

[28]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[29]  Takashi Washio,et al.  Discovering Admissible Model Equations from Observed Data Based on Scale-Types and Identity Constrains , 1999, IJCAI.

[30]  T. Plate ACCURACY VERSUS INTERPRETABILITY IN FLEXIBLE MODELING : IMPLEMENTING A TRADEOFF USING GAUSSIAN PROCESS MODELS , 1999 .

[31]  Vicki Bruce,et al.  Face Recognition: From Theory to Applications , 1999 .

[32]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[33]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[34]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[35]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[36]  Chong Gu Smoothing Spline Anova Models , 2002 .

[37]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[38]  Ofi rNw8x'pyzm,et al.  The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions , 2002 .

[39]  Marcus Hutter The Fastest and Shortest Algorithm for all Well-Defined Problems , 2002, Int. J. Found. Comput. Sci..

[40]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[41]  Radford M. Neal,et al.  Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[42]  Changshui Zhang,et al.  Kernel Trick Embedded Gaussian Mixture Model , 2003, ALT.

[43]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[44]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[45]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[46]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[47]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[48]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[49]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[50]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[51]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[52]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[53]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[54]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[55]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[56]  Robert M. Haralick,et al.  Nonlinear Manifold Clustering By Dimensionality , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[57]  Yuan Yao,et al.  Mercer's Theorem, Feature Maps, and Smoothing , 2006, COLT.

[58]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[59]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[60]  Ryan P. Adams,et al.  Bayesian Online Changepoint Detection , 2007, 0710.3742.

[61]  Kevin P. Murphy,et al.  Bayesian structure learning using dynamic programming and MCMC , 2007, UAI.

[62]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[63]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[64]  Laura Diosan,et al.  Evolving kernel functions for SVMs by genetic programming , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[65]  I. Kondor,et al.  Group theoretical methods in machine learning , 2008 .

[66]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[67]  Pascal Fua,et al.  Local deformation models for monocular 3D shape recovery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[69]  Iain Murray,et al.  Introduction to Gaussian Processes , 2008 .

[70]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[71]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[72]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[73]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[74]  Neil D. Lawrence,et al.  Non-linear matrix factorization with Gaussian processes , 2009, ICML '09.

[75]  Trevor Darrell,et al.  Rank priors for continuous non-linear dimensionality reduction , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Ryan P. Adams,et al.  Archipelago: nonparametric Bayesian semi-supervised learning , 2009, ICML '09.

[77]  Francis R. Bach,et al.  High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning , 2009, ArXiv.

[78]  S. Sain,et al.  Bayesian functional ANOVA modeling using Gaussian process prior distributions , 2010 .

[79]  Steven Reece,et al.  Sequential Bayesian Prediction in the Presence of Changepoints and Faults , 2010, Comput. J..

[80]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[81]  Wu Bing,et al.  A GP-based kernel construction and optimization method for RVM , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[82]  Carl E. Rasmussen,et al.  Gaussian Process Change Point Models , 2010, ICML.

[83]  Klaus-Robert Müller,et al.  Layer-wise analysis of deep networks with Gaussian kernels , 2010, NIPS.

[84]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[85]  David Ginsbourger,et al.  Additive Kernels for Gaussian Process Modeling , 2011, 1103.4023.

[86]  Ryan P. Adams,et al.  Learning the Structure of Deep Sparse Graphical Models , 2009, AISTATS.

[87]  Michael I. Jordan,et al.  Learning Programs: A Hierarchical Bayesian Approach , 2010, ICML.

[88]  Carl E. Rasmussen,et al.  Gaussian Mixture Modeling with Gaussian Process Latent Variable Models , 2010, DAGM-Symposium.

[89]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[90]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[91]  Noah D. Goodman,et al.  Learning a theory of causality. , 2011, Psychological review.

[92]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[93]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[94]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[95]  Pascal Vincent,et al.  Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.

[96]  Carl E. Rasmussen,et al.  Additive Gaussian Processes , 2011, NIPS.

[97]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[98]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[99]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[100]  Yichuan Zhang,et al.  Quasi-Newton Methods for Markov Chain Monte Carlo , 2011, NIPS.

[101]  Joshua B. Tenenbaum,et al.  Exploiting compositionality to explore a large space of model structures , 2012, UAI.

[102]  Gerhard Gröger,et al.  Geometry and Topology , 1989, Springer Handbook of Geographic Information.

[103]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[104]  D. Ginsbourger,et al.  Argumentwise invariant kernels for the approximation of invariant functions , 2012 .

[105]  Benjamin Schrauwen,et al.  Recurrent Kernel Machines: Computing with Infinite Echo State Networks , 2012, Neural Computation.

[106]  Andrew Gordon Wilson,et al.  Gaussian Process Regression Networks , 2011, ICML.

[107]  Radford M. Neal,et al.  Gaussian Process Regression with Heteroscedastic or Non-Gaussian Residuals , 2012, ArXiv.

[108]  David B. Dunson,et al.  Multiresolution Gaussian Processes , 2012, NIPS.

[109]  Tomoharu Iwata,et al.  Warped Mixtures for Nonparametric Cluster Shapes , 2012, UAI.

[110]  Neil D. Lawrence,et al.  Gaussian process models for periodicity detection , 2013, 1303.7090.

[111]  Matthew T. Harrison,et al.  A simple example of Dirichlet process mixture inconsistency for the number of components , 2013, NIPS.

[112]  Nitish Srivastava,et al.  Improving Neural Networks with Dropout , 2013 .

[113]  Ryan P. Adams,et al.  High-Dimensional Probability Estimation with Deep Density Models , 2013, ArXiv.

[114]  M. Ganesalingam,et al.  A fully automatic problem solver with human-style output , 2013, ArXiv.

[115]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[116]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[117]  D. Ginsbourger,et al.  Invariances of random fields paths, with applications in Gaussian Process Regression , 2013, 1308.1359.

[118]  Aki Vehtari,et al.  GPstuff: Bayesian modeling with Gaussian processes , 2013, J. Mach. Learn. Res..

[119]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[120]  Joshua B. Tenenbaum,et al.  Bootstrap Learning via Modular Concept Discovery , 2013, IJCAI.

[121]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[122]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[123]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[124]  Bernhard Schölkopf,et al.  Nonparametric dynamics estimation for time periodic systems , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[125]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[126]  Trevor Cohn,et al.  A temporal model of text periodicities using Gaussian Processes , 2013, EMNLP.

[127]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[128]  Gabriel Kronberger,et al.  Evolution of Covariance Functions for Gaussian Process Regression Using Genetic Programming , 2013, EUROCAST.

[129]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[130]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[131]  A. Damianou,et al.  Deep Gaussian Processes for Large Datasets , 2014 .

[132]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[133]  Roger Baker Grosse,et al.  Model selection in compositional spaces , 2014 .

[134]  Yura N. Perov,et al.  Venture: a higher-order probabilistic programming platform with programmable inference , 2014, ArXiv.

[135]  Ryan P. Adams,et al.  Avoiding pathologies in very deep networks , 2014, AISTATS.

[136]  Stephen G. Walker,et al.  Univariate Bayesian nonparametric mixture modeling with unimodal kernels , 2014, Stat. Comput..

[137]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[138]  James Robert Lloyd,et al.  GEFCom2012 hierarchical load forecasting: Gradient boosting machines and Gaussian processes , 2014 .

[139]  Elad Gilboa,et al.  Scaling Multidimensional Inference for Structured Gaussian Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[140]  Christian Steinruecken,et al.  Lossless Data Compression , 2009, Encyclopedia of Database Systems.

[141]  Avraham Adler,et al.  Lambert-W Function , 2015 .