Pattern Recognition and Machine Learning

Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  L. M. M.-T. Theory of Probability , 1929, Nature.

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[5]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[6]  L. Stein,et al.  Probability and the Weighing of Evidence , 1950 .

[7]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[8]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[9]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[10]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[11]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[12]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[13]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[14]  E. Nadaraya On Estimating Regression , 1964 .

[15]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[16]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[17]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[18]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[19]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[20]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[21]  A. M. Walker On the Asymptotic Behaviour of Posterior Distributions , 1969 .

[22]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[23]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[24]  Alan J. Mayne,et al.  Generalized Inverse of Matrices and its Applications , 1972 .

[25]  R. Mazo On the theory of brownian motion , 1973 .

[26]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[27]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[28]  H. Akaike A new look at the statistical model identification , 1974 .

[29]  J. Besag On Spatial-Temporal Models and Markov Fields , 1977 .

[30]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[31]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[34]  A. Dawid Conditional Independence for Statistical Operations , 1980 .

[35]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[36]  F. Krauss Latent Structure Analysis , 1980 .

[37]  V.W.S. Chan,et al.  Principles of Digital Communication and Coding , 1979 .

[38]  S. Adler Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions , 1981 .

[39]  Philip E. Gill,et al.  Practical optimization , 1981 .

[40]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[41]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[42]  David Lindley Scoring rules and the inevitability of probability , 1982 .

[43]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[44]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[45]  Josef Kittler,et al.  Contextual classification of multispectral pixel data , 1984, Image Vis. Comput..

[46]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[48]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[49]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[50]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986, Encyclopedia of Big Data.

[51]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[52]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[53]  Kin Hong Wong,et al.  Script recognition using hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[54]  L. Sirovich Turbulence and the dynamics of coherent structures. II. Symmetries and transformations , 1987 .

[55]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[56]  H. Reinhardt Statistical Decision Theory and Bayesian Analysis. Second Edition (James O. Berger) , 1987 .

[57]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[58]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[59]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[60]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[61]  M. Hodgson Reducing the computational requirements of the minimum-distance classifier , 1988 .

[62]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[63]  S. Ragazzini,et al.  Learning of word stress in a sub-optimal second order back-propagation neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[64]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[65]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[66]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[67]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[68]  Ross D. Shachter,et al.  Simulation Approaches to General Probabilistic Inference on Belief Networks , 2013, UAI.

[69]  Stephen F. Gull,et al.  Developments in Maximum Entropy Data Analysis , 1989 .

[70]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[71]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[72]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[73]  Kuo-Chu Chang,et al.  Weighing and Integrating Evidence for Stochastic Simulation in Bayesian Networks , 2013, UAI.

[74]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[75]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[76]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[77]  V. Ramasubramanian,et al.  A generalized optimization of the K-d tree for fast nearest-neighbour search , 1989, Fourth IEEE Region 10 International Conference TENCON.

[78]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[79]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[80]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[81]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[82]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[83]  Neil E. Cotter,et al.  The Stone-Weierstrass theorem and its application to neural networks , 1990, IEEE Trans. Neural Networks.

[84]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[85]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[86]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[87]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[88]  M. Frydenberg The chain graph Markov property , 1990 .

[89]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[90]  Vladik Kreinovich,et al.  Arbitrary nonlinearity is sufficient to represent all functions by neural networks: A theorem , 1991, Neural Networks.

[91]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[92]  Christopher M. Bishop,et al.  A Fast Procedure for Retraining the Multilayer Perceptron , 1991, Int. J. Neural Syst..

[93]  Yoshifusa Ito,et al.  Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory , 1991, Neural Networks.

[94]  Marlin A. Koschat,et al.  Maximum Entropy Methods in Science and Engineering (Vol. 2) , 1991 .

[95]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[96]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[97]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[98]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[99]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[100]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[101]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[102]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[103]  Chris Bishop,et al.  Current address: Microsoft Research, , 2022 .

[104]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[105]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[106]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[107]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[108]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[109]  S. Lauritzen Propagation of Probabilities, Means, and Variances in Mixed Graphical Association Models , 1992 .

[110]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[111]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[112]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[113]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[114]  Christopher M. Bishop,et al.  Curvature-driven smoothing: a learning algorithm for feedforward networks , 1993, IEEE Trans. Neural Networks.

[115]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[116]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[117]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[118]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[119]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[120]  Joab R Winkler,et al.  Numerical recipes in C: The art of scientific computing, second edition , 1993 .

[121]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..

[122]  Robert Hecht-Nielsen,et al.  On the Geometry of Feedforward Neural Network Error Surfaces , 1993, Neural Computation.

[123]  C. Bishop,et al.  Analysis of multiphase flows using dual-energy gamma densitometry and neural networks , 1993 .

[124]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[125]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[126]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[127]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[128]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[129]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[130]  Andrew R. Webb,et al.  Functional approximation by feed-forward networks: a least-squares approach to generalization , 1994, IEEE Trans. Neural Networks.

[131]  Wray L. Buntine,et al.  Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.

[132]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[133]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[134]  W. Gibbs,et al.  Finite element methods , 2017, Graduate Studies in Mathematics.

[135]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[136]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[137]  Uffe Kjærulff,et al.  Blocking Gibbs sampling in very large probabilistic expert systems , 1995, Int. J. Hum. Comput. Stud..

[138]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[139]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[140]  Persi Diaconis,et al.  What do we know about the Metropolis algorithm? , 1995, STOC '95.

[141]  J. Besag,et al.  Bayesian Computation and Stochastic Systems , 1995 .

[142]  Todd K. Leen,et al.  From Data Distributions to Regularization in Invariant Learning , 1995, Neural Computation.

[143]  J. E. Jackson Statistical Factor Analysis and Related Methods: Theory and Applications , 1995 .

[144]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[145]  Walter R. Gilks,et al.  Adaptive rejection metropolis sampling , 1995 .

[146]  Christopher M. Bishop,et al.  Modelling conditional probability distributions for periodic variables , 1995 .

[147]  Christopher M. Bishop,et al.  EM Optimization of Latent-Variables Density Models , 1995, NIPS.

[148]  Christopher M. Bishop,et al.  Training with Noise is Equivalent to Tikhonov Regularization , 1995, Neural Computation.

[149]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[150]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[151]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[152]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[153]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[154]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[155]  Barak A. Pearlmutter,et al.  Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA , 1996, NIPS.

[156]  Mark Jerrum,et al.  The Markov chain Monte Carlo method: an approach to approximate counting and integration , 1996 .

[157]  P. M. Williams,et al.  Using Neural Networks to Model Conditional Multivariate Densities , 1996, Neural Computation.

[158]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[159]  David Barber,et al.  Bayesian Model Comparison by Monte Carlo Chaining , 1996, NIPS.

[160]  Christopher M. Bishop,et al.  GTM: A Principled Alternative to the Self-Organizing Map , 1996, NIPS.

[161]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[162]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[163]  H. Luetkepohl The Handbook of Matrices , 1996 .

[164]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[165]  David Barber,et al.  Ensemble Learning for Multi-Layer Networks , 1997, NIPS.

[166]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[167]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[168]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[169]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[170]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[171]  Christopher K. I. Williams,et al.  An upper bound on the Bayesian error bars for generalized linear regression , 1997 .

[172]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[173]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[174]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[175]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[176]  Christopher K. I. Williams,et al.  Magnification factors for the GTM algorithm , 1997 .

[177]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[178]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[179]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[180]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[181]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[182]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[183]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[184]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[185]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[186]  Michael E. Tipping Probabilistic Visualisation of High-Dimensional Binary Data , 1998, NIPS.

[187]  Radford M. Neal,et al.  Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation , 1995, Learning in Graphical Models.

[188]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[189]  T. Ens,et al.  Blind signal separation : statistical principles , 1998 .

[190]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[191]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[192]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[193]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[194]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[195]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[196]  Sadik Kapadia,et al.  Discriminative Training of Hidden Markov Models , 1998 .

[197]  Charles E. McCulloch,et al.  The EM Algorithm and Its Extensions , 1998 .

[198]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[199]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[200]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[201]  S. Mallat A wavelet tour of signal processing , 1998 .

[202]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[203]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[204]  Michael I. Jordan Graphical Models , 1998 .

[205]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[206]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[207]  David J. C. MacKay,et al.  Good Error-Correcting Codes Based on Very Sparse Matrices , 1997, IEEE Trans. Inf. Theory.

[208]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[209]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[210]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[211]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[212]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[213]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[214]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[215]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[216]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[217]  Thomas Hofmann,et al.  Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization , 1999, NIPS.

[218]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[219]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[220]  J. March Introduction to the Calculus of Variations , 1999 .

[221]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[222]  Eric W. Weisstein,et al.  The CRC concise encyclopedia of mathematics , 1999 .

[223]  Charles M. Bishop Variational principal components , 1999 .

[224]  Purushottam W. Laud,et al.  Bayesian Nonparametric Inference for Random Distributions and Related Functions , 1999 .

[225]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[226]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[227]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[228]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[229]  Daphne Koller,et al.  Restricted Bayes Optimal Classifiers , 2000, AAAI/IAAI.

[230]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[231]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[232]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[233]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[234]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[235]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[236]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[237]  John A. Bather,et al.  Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions , 2000 .

[238]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[239]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[240]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[241]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[242]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[243]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[244]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[245]  David J.C. Mackay,et al.  Density networks , 2000 .

[246]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[247]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[248]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[249]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[250]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[251]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[252]  Sujit K. Ghosh,et al.  Essential Wavelets for Statistical Applications and Data Analysis , 2001, Technometrics.

[253]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[254]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[255]  Paul Zarchan,et al.  Fundamentals of Kalman Filtering: A Practical Approach , 2001 .

[256]  Antony I. T. Rowstron,et al.  Optimising Synchronisation Times for Mobile Devices , 2001, NIPS.

[257]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[258]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[259]  Yee Whye Teh,et al.  A New View of ICA , 2001 .

[260]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[261]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[262]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[263]  Peter Tiño,et al.  Using Directional Curvatures to Visualize Folding Patterns of the GTM Projection Manifolds , 2001, ICANN.

[264]  W. Michael Conklin,et al.  Monte Carlo Methods in Bayesian Computation , 2001, Technometrics.

[265]  Michael E. Tipping,et al.  Analysis of Sparse Bayesian Learning , 2001, NIPS.

[266]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[267]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[268]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[269]  B. Rannala Bioinformatics: The Machine Learning Approach.Second Edition. Adaptive Computation and Machine Learning. ByPierre Baldiand, Sørenv Brunak.A Bradford Book. Cambridge (Massachusetts): MIT Press. $49.95. xxiii + 452 p; ill.; index. ISBN: 0–262–02506‐X. 2001. , 2002 .

[270]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[271]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[272]  Ole Winther,et al.  Mean-Field Approaches to Independent Component Analysis , 2002, Neural Computation.

[273]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[274]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[275]  Peter Tiño,et al.  Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[276]  Tom Heskes,et al.  Fractional Belief Propagation , 2002, NIPS.

[277]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[278]  Thore Graepel,et al.  Solving Noisy Linear Operator Equations by Gaussian Processes: Application to Ordinary and Partial Differential Equations , 2003, ICML.

[279]  Michael I. Jordan,et al.  Hierarchical Bayesian Models for Applications in Information Retrieval , 2003 .

[280]  Jong-Hoon Ahn,et al.  A Constrained EM Algorithm for Principal Component Analysis , 2003, Neural Computation.

[281]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[282]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[283]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[284]  Bernhard Schölkopf,et al.  Learning to Find Pre-Images , 2003, NIPS.

[285]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[286]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[287]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[288]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[289]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[290]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003, AISTATS.

[291]  Stephen J. Roberts,et al.  Variational Mixture of Bayesian Independent Component Analyzers , 2003, Neural Computation.

[292]  Terrence J. Sejnowski,et al.  Variational Bayesian Learning of ICA with Missing Data , 2003, Neural Computation.

[293]  Christopher K. I. Williams Learning Kernel Classifiers , 2003 .

[294]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[295]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[296]  James V. Stone Independent Component Analysis: A Tutorial Introduction , 2007 .

[297]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[298]  Refik Soyer,et al.  Bayesian Methods for Nonlinear Classification and Regression , 2004, Technometrics.

[299]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[300]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[301]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[302]  Volker Tresp,et al.  Scaling Kernel-Based Systems to Large Data Sets , 2001, Data Mining and Knowledge Discovery.

[303]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[304]  Bo Thiesson,et al.  ARMA Time-Series Modeling with Graphical Models , 2004, UAI.

[305]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[306]  T. Minka Power EP , 2004 .

[307]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2003 .

[308]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[309]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[310]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[311]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[312]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[313]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[314]  Christopher M. Bishop,et al.  Robust Bayesian Mixture Modelling , 2005, ESANN.

[315]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[316]  Christopher M. Bishop,et al.  Distinguishing text from graphics in on-line handwritten ink , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[317]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[318]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[319]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[320]  Stephen J. Roberts,et al.  An Anthology of Probabilistic Models for Medical Informatics , 2005 .

[321]  Andrew Blake,et al.  Sparse Bayesian learning for efficient visual tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[322]  Carl E. Rasmussen,et al.  Assessing Approximations for Gaussian Process Classification , 2005, NIPS.

[323]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[324]  A. Rollett,et al.  The Monte Carlo Method , 2004 .

[325]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[326]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[327]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[328]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[329]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[330]  Tony Jebara,et al.  Machine learning: Discriminative and generative , 2006 .

[331]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[332]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[333]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[334]  Murray A. Jorgensen Iteratively Reweighted Least Squares , 2006 .

[335]  H. Robbins A Stochastic Approximation Method , 1951 .

[336]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[337]  T. Hastie,et al.  Principal Curves , 2007 .

[338]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[339]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[340]  Sunita Sarawagi Learning with Graphical Models , 2008 .

[341]  P. Deb Finite Mixture Models , 2008 .

[342]  Iain Murray,et al.  Introduction To Gaussian Processes , 2008 .

[343]  S. E. Ahmed,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 2008, Technometrics.

[344]  Jan de Leeuw,et al.  Journal of Statistical Software , 2009 .

[345]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[346]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[347]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .