with Sparse Kernels

[1]  R. Palmer,et al.  Solution of 'Solvable model of a spin glass' , 1977 .

[2]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[3]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[4]  Zehua Chen Fitting Multivariate Regression Functions by Interaction Spline Models , 1993 .

[5]  Keinosuke Fukunaga,et al.  Statistical Pattern Recognition , 1993, Handbook of Pattern Recognition and Computer Vision.

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[8]  I. W. Wright Splines in Statistics , 1983 .

[9]  H. Omre Bayesian kriging—Merging observations and qualified guesses in kriging , 1987 .

[10]  Steve R. Waterhouse,et al.  Non-linear Prediction of Acoustic Vectors Using Hierarchical Mixtures of Experts , 1994, NIPS.

[11]  Martin Brown,et al.  Network Performance Assessment for Neurofuzzy Data Modelling , 1997, IDA.

[12]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[13]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[14]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo in Practice: A Roundtable Discussion , 1998 .

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[18]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[19]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[20]  R. A. Gaskins,et al.  Nonparametric roughness penalties for probability densities , 2022 .

[21]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[22]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[23]  Hans Henrik Thodberg,et al.  A review of Bayesian neural networks with an application to near infrared spectroscopy , 1996, IEEE Trans. Neural Networks.

[24]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[25]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[26]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[27]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[28]  Steve R. Gunn,et al.  Data driven knowledge extraction of materials properties , 1999, Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials. IPMM'99 (Cat. No.99EX296).

[29]  B. Silverman,et al.  On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method , 1982 .

[30]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[31]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[32]  A. Dawid Some Misleading Arguments Involving Conditional Independence , 1979 .

[33]  Massimiliano Pontil,et al.  On the Vgamma Dimension for Regression in Reproducing Kernel Hilbert Spaces , 1999, ALT.

[34]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[35]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[36]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[37]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Stephen F. Gull,et al.  Developments in Maximum Entropy Data Analysis , 1989 .

[39]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[40]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[41]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[42]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[43]  G. Wahba Spline models for observational data , 1990 .

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[46]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[47]  S. Roberts,et al.  Confidence Intervals and Prediction Intervals for Feed-Forward Neural Networks , 2001 .

[48]  Julian Besag,et al.  Markov Chain Monte Carlo for Statistical Inference , 2002 .

[49]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[51]  Tommi S. Jaakkola,et al.  Feature Selection and Dualities in Maximum Entropy Discrimination , 2000, UAI.

[52]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[53]  Peter Sollich,et al.  Probabilistic Methods for Support Vector Machines , 1999, NIPS.

[54]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[55]  A. M. Walker On the Asymptotic Behaviour of Posterior Distributions , 1969 .

[56]  J. Weston,et al.  Support vector regression with ANOVA decomposition kernels , 1999 .

[57]  Martin Brown,et al.  Neurofuzzy adaptive modelling and control , 1994 .

[58]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[59]  Bernhard Schölkopf,et al.  Semiparametric Support Vector and Linear Programming Machines , 1998, NIPS.

[60]  R. Tibshirani,et al.  Linear Smoothers and Additive Models , 1989 .

[61]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[62]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[63]  Michael I. Jordan,et al.  Variational methods for inference and estimation in graphical models , 1997 .

[64]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[65]  Peter Sollich Probabilistic interpretations and Bayesian methods for support vector machines , 1999 .

[66]  T. Plate ACCURACY VERSUS INTERPRETABILITY IN FLEXIBLE MODELING : IMPLEMENTING A TRADEOFF USING GAUSSIAN PROCESS MODELS , 1999 .

[67]  P. Laurent,et al.  A general method for the construction of interpolating or smoothing spline-functions , 1968 .

[68]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[69]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[70]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[71]  P. Whittle On the Smoothing of Probability Density Functions , 1958 .

[72]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[73]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[74]  Tomaso A. Poggio,et al.  A Sparse Representation for Function Approximation , 1998, Neural Computation.

[75]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[76]  Kenneth Rose,et al.  A Deterministic Annealing Approach for Parsimonious Design of Piecewise Regression Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  P. Kitanidis Parameter Uncertainty in Estimation of Spatial Functions: Bayesian Analysis , 1986 .

[78]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[79]  Peter W. Glynn,et al.  Stationarity detection in the initial transient problem , 1992, TOMC.

[80]  David Barber,et al.  Ensemble Learning for Multi-Layer Networks , 1997, NIPS.

[81]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[82]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[83]  Gunnar Rätsch,et al.  Kernel PCA pattern reconstruction via approximate pre-images. , 1998 .

[84]  Chong Gu,et al.  Structured Machine Learning for Soft Classification with Smoothing Spline ANOVA and Stacked Tuning, Testing, and Evaluation , 1993, NIPS.

[85]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[86]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[87]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[88]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[89]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[90]  J. Friedman Multivariate adaptive regression splines , 1990 .

[91]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[92]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[93]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[94]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[95]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[96]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[97]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[98]  D. Cox Asymptotics for $M$-Type Smoothing Splines , 1983 .

[99]  Dirk Husmeier,et al.  Neural Networks for Conditional Probability Estimation , 1999, Perspectives in Neural Computing.

[100]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[101]  G. Matheron Principles of geostatistics , 1963 .

[102]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[103]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[104]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[105]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .

[106]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[107]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[108]  G. Wahba Bayesian "Confidence Intervals" for the Cross-validated Smoothing Spline , 1983 .

[109]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[110]  Nicholas G. Polson,et al.  On the Geometric Convergence of the Gibbs Sampler , 1994 .

[111]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[112]  Michel Mouchart,et al.  Discussion on "Conditional independence in statistitical theory" by A.P. Dawid , 1979 .

[113]  Steffen Gutjahr,et al.  Extended Bayesian learning , 1997, ESANN.

[114]  J. L. Walsh,et al.  The theory of splines and their applications , 1969 .

[115]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.