Machine learning - a probabilistic perspective

Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

[1]  Roland P. Falkner,et al.  History of statistics , 1891 .

[2]  L. M. M.-T. Theory of Probability , 1929, Nature.

[3]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[4]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[5]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[6]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[7]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[8]  C. Striebel,et al.  On the maximum likelihood estimates for linear dynamic systems , 1965 .

[9]  W. T. Williams,et al.  Dissimilarity Analysis: a new Technique of Hierarchical Sub-division , 1964, Nature.

[10]  Laveen N. Kanal,et al.  Classification of binary random patterns , 1965, IEEE Trans. Inf. Theory.

[11]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[12]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[13]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[14]  S. Fienberg An Iterative Procedure for Estimation in Contingency Tables , 1970 .

[15]  M. Degroot Optimal Statistical Decisions , 1970 .

[16]  H. Sorenson,et al.  Recursive bayesian estimation using gaussian sums , 1971 .

[17]  Umberto Bertelè,et al.  Nonserial Dynamic Programming , 1972 .

[18]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[19]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[20]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[21]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[22]  R. Plackett The Analysis of Permutations , 1975 .

[23]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[24]  B. Efron,et al.  Data Analysis Using Stein's Estimator and its Generalizations , 1975 .

[25]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[26]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[27]  D. Lindley,et al.  Inference for a Bernoulli Process (a Bayesian View) , 1976 .

[28]  R. Tarjan,et al.  A Separator Theorem for Planar Graphs , 1977 .

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[31]  Probability functions on complex pedigrees , 1978 .

[32]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .

[33]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[34]  G. J. Hahn,et al.  A Simple Method for Regression Analysis With Censored Data , 1979 .

[35]  D. W. Scott On optimal and data based histograms , 1979 .

[36]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[37]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[38]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[39]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[40]  David Lindley Scoring rules and the inevitability of probability , 1982 .

[41]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[42]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  S. J. Press,et al.  Applied multivariate analysis : using Bayesian and frequentist methods of inference , 1984 .

[44]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[45]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[46]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[47]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[48]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[49]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[50]  T. Speed,et al.  Gaussian Markov Distributions over Finite Graphs , 1986 .

[51]  H. Brachinger,et al.  Decision analysis , 1997 .

[52]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[53]  David Lindley,et al.  Bayesian Statistics, a Review , 1987 .

[54]  George E. P. Box,et al.  Empirical Model‐Building and Response Surfaces , 1988 .

[55]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[56]  Tomaso Poggio,et al.  Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[57]  M. West On scale mixtures of normal distributions , 1987 .

[58]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[59]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[60]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[61]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[62]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[63]  Y. Bar-Shalom Tracking and data association , 1988 .

[64]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[65]  James B. McDonald,et al.  Partially Adaptive Estimation of Regression Models via the Generalized T Distribution , 1988, Econometric Theory.

[66]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[67]  D. Rubin Using the SIR algorithm to simulate posterior distributions , 1988 .

[68]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[69]  Zvi Galil,et al.  Efficient implementation of graph algorithms using contraction , 1984, JACM.

[70]  Ross D. Shachter,et al.  Simulation Approaches to General Probabilistic Inference on Belief Networks , 2013, UAI.

[71]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[72]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[73]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[74]  Kuo-Chu Chang,et al.  Weighing and Integrating Evidence for Stochastic Simulation in Bayesian Networks , 2013, UAI.

[75]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[76]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[77]  C. Robert Kenley,et al.  Gaussian influence diagrams , 1989 .

[78]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[79]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[80]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[81]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[82]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[83]  W. Ewens Population Genetics Theory - The Past and the Future , 1990 .

[84]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[85]  James D. Hamilton Analysis of time series subject to changes in regime , 1990 .

[86]  Uue Kjjrull Triangulation of Graphs { Algorithms Giving Small Total State Space Triangulation of Graphs { Algorithms Giving Small Total State Space , 1990 .

[87]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[88]  Prakash P. Shenoy,et al.  Probability propagation , 1990, Annals of Mathematics and Artificial Intelligence.

[89]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[90]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[91]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[92]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[93]  J. Sarkar One-Armed Bandit Problems with Covariates , 1991 .

[94]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[95]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[96]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[97]  Man-Suk Oh,et al.  Adaptive importance sampling in monte carlo integration , 1992 .

[98]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[99]  A. Atkinson Subset Selection in Regression , 1992 .

[100]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[101]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[102]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[103]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[104]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[105]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[106]  S. Lauritzen Propagation of Probabilities, Means, and Variances in Mixed Graphical Association Models , 1992 .

[107]  Alan E. Gelfand,et al.  Bayesian statistics without tears: A sampling-resampling perspective , 1992 .

[108]  A. P. Dawid,et al.  Applications of a general propagation algorithm for probabilistic expert systems , 1992 .

[109]  P. Gács,et al.  Algorithms , 1992 .

[110]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[111]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[112]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[113]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[114]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[115]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[116]  J. Wendelberger Adventures in Stochastic Processes , 1993 .

[117]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[118]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[119]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[120]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[121]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[122]  David Haussler,et al.  Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families , 1993, ISMB.

[123]  Joab R Winkler,et al.  Numerical recipes in C: The art of scientific computing, second edition , 1993 .

[124]  Adrian F. M. Smith,et al.  Bayesian Inference for Generalized Linear and Proportional Hazards Models Via Gibbs Sampling , 1993 .

[125]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[126]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[127]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[128]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[129]  Dorothea Wagner,et al.  Between Min Cut and Graph Bisection , 1993, MFCS.

[130]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[131]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[132]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[133]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[134]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[135]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[136]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[137]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[138]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[139]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[140]  S. Srihari Mixture Density Networks , 1994 .

[141]  G. Kitagawa The two-filter formula for smoothing and an implementation of the Gaussian-sum smoother , 1994 .

[142]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[143]  Stuart J. Russell,et al.  Local Learning in Probabilistic Networks with Hidden Variables , 1995, IJCAI.

[144]  Uffe Kjærulff,et al.  Blocking Gibbs sampling in very large probabilistic expert systems , 1995, Int. J. Hum. Comput. Stud..

[145]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[146]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .

[147]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[148]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[149]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[150]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[151]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[152]  Yoshua Bengio,et al.  Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[153]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[154]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[155]  Juha Karhunen,et al.  Generalizations of principal component analysis, optimization problems, and neural networks , 1995, Neural Networks.

[156]  Walter R. Gilks,et al.  Adaptive rejection metropolis sampling , 1995 .

[157]  S. Chen,et al.  Fast orthogonal least squares algorithm for efficient subset model selection , 1995, IEEE Trans. Signal Process..

[158]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[159]  David J. C. MacKay,et al.  Developments in Probabilistic Modelling with Neural Networks - Ensemble Learning , 1995, SNN Symposium on Neural Networks.

[160]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[161]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[162]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[163]  Stuart J. Russell,et al.  The BATmobile: Towards a Bayesian Automated Taxi , 1995, IJCAI.

[164]  David Mumford,et al.  Neuronal Architectures for Pattern-theoretic Problems , 1995 .

[165]  X. Descombes,et al.  The Ising/Potts model is not well suited to segmentation tasks , 1996, 1996 IEEE Digital Signal Processing Workshop Proceedings.

[166]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[167]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[168]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[169]  Mark Jerrum,et al.  The Markov chain Monte Carlo method: an approach to approximate counting and integration , 1996 .

[170]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[171]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[172]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network , 1996, UAI.

[173]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[174]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[175]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[176]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[177]  Alice M. Agogino,et al.  Inference Using Message Propagation and Topology Transformation in Vector Gaussian Continuous Networks , 1996, UAI.

[178]  Alan E. Gelfand,et al.  Model Determination using sampling-based methods , 1996 .

[179]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[180]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[181]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[182]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[183]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[184]  Michael I. Jordan,et al.  Computing upper and lower bounds on likelihoods in intractable networks , 1996, UAI.

[185]  Michael I. Jordan,et al.  A variational approach to Bayesian logistic regression problems and their extensions , 1996 .

[186]  Rina Dechter,et al.  Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.

[187]  Matthew Brand,et al.  Coupled hidden Markov models for modeling interacting processes , 1997 .

[188]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[189]  Hilbert J. Kappen,et al.  Boltzmann Machine Learning Using Mean Field Theory and Linear Response Correction , 1997, NIPS.

[190]  G. Roberts,et al.  Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler , 1997 .

[191]  C. Klaassen,et al.  Efficient estimation in the bivariate normal copula model: normal margins are least favourable , 1997 .

[192]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[193]  Marcos Raydan,et al.  The Barzilai and Borwein Gradient Method for the Large Scale Unconstrained Minimization Problem , 1997, SIAM J. Optim..

[194]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[195]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[196]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[197]  Eric Moulines,et al.  Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[198]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[199]  Chuanyi Ji,et al.  An Efficient EM-based Training Algorithm for Feedforward Neural Networks , 1997, Neural Networks.

[200]  Kevin P. Murphy,et al.  Space-Efficient Inference in Dynamic Probabilistic Networks , 1997, IJCAI.

[201]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[202]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[203]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[204]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[205]  Francesca Rossi,et al.  Semiring-based constraint satisfaction and optimization , 1997, JACM.

[206]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[207]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[208]  David Heckerman,et al.  Structure and Parameter Learning for Causal Independence and Causal Interaction Models , 1997, UAI.

[209]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[210]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[211]  Pedro Larrañaga,et al.  Decomposing Bayesian networks: triangulation of the moral graph with genetic algorithms , 1997, Stat. Comput..

[212]  Philippe Garat,et al.  Blind separation of mixture of independent sources through a quasi-maximum likelihood approach , 1997, IEEE Trans. Signal Process..

[213]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[214]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[215]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[216]  Pierre Del Moral,et al.  Discrete Filtering Using Branching and Interacting Particle Systems , 1998 .

[217]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[218]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[219]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[220]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[221]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[222]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[223]  Michael E. Tipping Probabilistic Visualisation of High-Dimensional Binary Data , 1998, NIPS.

[224]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[225]  Ross D. Shachter Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) , 1998, UAI.

[226]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[227]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[228]  Trevor Hastie,et al.  The Error Coding Method and PICTs , 1998 .

[229]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[230]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[231]  L. Breiman Arcing Classifiers , 1998 .

[232]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[233]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[234]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[235]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[236]  T. Fearn,et al.  Multivariate Bayesian variable selection and prediction , 1998 .

[237]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[238]  Yoram Singer,et al.  Efficient Bayesian Parameter Estimation in Large Discrete Domains , 1998, NIPS.

[239]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[240]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[241]  Michael I. Jordan Graphical Models , 1998 .

[242]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[243]  A. Berchtold The double chain markov model , 1999 .

[244]  Kevin P. Murphy,et al.  Bayesian Map Learning in Dynamic Environments , 1999, NIPS.

[245]  Samy Bengio,et al.  Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[246]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[247]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[248]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[249]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[250]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[251]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[252]  D. Berry,et al.  Bayesian perspectives on multiple comparisons , 1999 .

[253]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[254]  Shivakumar Vaithyanathan,et al.  Model Selection in Unsupervised Learning with Applications To Document Clustering , 1999, International Conference on Machine Learning.

[255]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[256]  P. Green,et al.  Decomposable graphical Gaussian model determination , 1999 .

[257]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[258]  Richard D. Deveaux,et al.  Applied Smoothing Techniques for Data Analysis , 1999, Technometrics.

[259]  J. Tenenbaum A Bayesian framework for concept learning , 1999 .

[260]  D. Rubin,et al.  ML ESTIMATION OF THE t DISTRIBUTION USING EM AND ITS EXTENSIONS, ECM AND ECME , 1999 .

[261]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[262]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[263]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[264]  Christopher K. I. Williams A MCMC Approach to Hierarchical Mixture Modelling , 1999, NIPS.

[265]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[266]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[267]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[268]  Pieter J. Mosterman,et al.  Diagnosis of continuous valued systems in transient operating regions , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[269]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[270]  David Heckerman,et al.  Dependency Networks for Density Estimation, Collaborative Filtering, and Data Visualization , 2000 .

[271]  Jeff A. Bilmes,et al.  Dynamic Bayesian Multinets , 2000, UAI.

[272]  Michael I. Jordan,et al.  Attractor Dynamics in Feedforward Neural Networks , 2000, Neural Computation.

[273]  B. Mallick,et al.  Generalized Linear Models : A Bayesian Perspective , 2000 .

[274]  Jim Albert,et al.  Ordinal Data Modeling , 2000 .

[275]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[276]  Massimiliano Pontil,et al.  On the Noise Model of Support Vector Machines Regression , 2000, ALT.

[277]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[278]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[279]  Nathalie Japkowicz,et al.  Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.

[280]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[281]  Nir Friedman,et al.  Discovering Hidden Variables: A Structure-Based Approach , 2000, NIPS.

[282]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[283]  Geoffrey Zweig,et al.  Exact alpha-beta computation in logarithmic space with application to MAP word graph construction , 2000, INTERSPEECH.

[284]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[285]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[286]  Nando de Freitas,et al.  The Unscented Particle Filter , 2000, NIPS.

[287]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[288]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[289]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[290]  Jaimyoung Kwon Modeling Freeway Traffic with Coupled HMMs , 2000 .

[291]  Michael Hu,et al.  A Hierarchical HMM Implementation for Vertebrate Gene Splice Site Prediction , 2000 .

[292]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[293]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[294]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[295]  B. Cipra The Ising Model Is NP-Complete , 2000 .

[296]  Tommi S. Jaakkola,et al.  Tractable Bayesian learning of tree belief networks , 2000, Stat. Comput..

[297]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[298]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[299]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[300]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[301]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[302]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[303]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[304]  Andrew H. Gee,et al.  Hierarchical Bayesian Models for Regularization in Sequential Learning , 2000, Neural Computation.

[305]  Jun S. Liu,et al.  Mixture Kalman filters , 2000 .

[306]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[307]  M. Stephens Dealing with label switching in mixture models , 2000 .

[308]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[309]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[310]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[311]  Kevin P. Murphy,et al.  Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[312]  Thomas L. Griffiths,et al.  Using Vocabulary Knowledge in Bayesian Multinomial Estimation , 2001, NIPS.

[313]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[314]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[315]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[316]  T. Minka Inferring a Gaussian distribution , 2001 .

[317]  Sumit Basu,et al.  Learning Human Interactions w ith the Influence Model , 2001, NIPS 2001.

[318]  Jacob Goldberger,et al.  Sequentially finding the N-Best List in Hidden Markov Models , 2001, IJCAI.

[319]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[320]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[321]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[322]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[323]  Steffen L. Lauritzen,et al.  Representing and Solving Decision Problems with Limited Information , 2001, Manag. Sci..

[324]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[325]  Kenneth Rose,et al.  Deterministically annealed design of hidden Markov model speech recognizers , 2001, IEEE Trans. Speech Audio Process..

[326]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[327]  Daphne Koller,et al.  Sampling in Factored Dynamic Systems , 2001, Sequential Monte Carlo Methods in Practice.

[328]  Nir Friedman,et al.  Context-specific Bayesian clustering for gene expression data , 2001, J. Comput. Biol..

[329]  Tal Pupko,et al.  A structural EM algorithm for phylogenetic inference , 2001, J. Comput. Biol..

[330]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[331]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[332]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[333]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[334]  Yee Whye Teh,et al.  Discovering Multiple Constraints that are Frequently Approximately Satisfied , 2001, UAI.

[335]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[336]  Robert Kohn,et al.  Nonparametric regression using linear combinations of basis functions , 2001, Stat. Comput..

[337]  Steffen L. Lauritzen,et al.  Causal Inference from Graphical Models , 2001 .

[338]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[339]  Javier R. Movellan,et al.  Diffusion Networks, Products of Experts, and Factor Analysis , 2001 .

[340]  Chalee Asavathiratham,et al.  The influence model: a tractable representation for the dynamics of networked Markov chains , 2001 .

[341]  Arnaud Doucet,et al.  Particle filters for state estimation of jump Markov linear systems , 2001, IEEE Trans. Signal Process..

[342]  Yee Whye Teh,et al.  Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation , 2001, UAI.

[343]  M. J. Bayarri,et al.  Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .

[344]  Edward I. George,et al.  The Practical Implementation of Bayesian Model Selection , 2001 .

[345]  Keiji Nagatani,et al.  Topological simultaneous localization and mapping (SLAM): toward exact localization without explicit localization , 2001, IEEE Trans. Robotics Autom..

[346]  Eric Vigoda,et al.  A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries , 2001, STOC '01.

[347]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[348]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[349]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[350]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[351]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[352]  Uri Lerner,et al.  Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms , 2001, UAI.

[353]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[354]  Nathan Srebro,et al.  Maximum likelihood bounded tree-width Markov networks , 2001, Artif. Intell..

[355]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[356]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[357]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[358]  M. Opper,et al.  Comparing the Mean Field Method and Belief Propagation for Approximate Inference in MRFs , 2001 .

[359]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[360]  Mats G. Gustafsson,et al.  A Probabilistic Derivation of the Partial Least-Squares Algorithm , 2001, J. Chem. Inf. Comput. Sci..

[361]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[362]  Nando de Freitas,et al.  Robust Full Bayesian Learning for Radial Basis Networks , 2001, Neural Computation.

[363]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[364]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[365]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[366]  Yee Whye Teh,et al.  An Alternate Objective Function for Markovian Fields , 2002, ICML.

[367]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[368]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[369]  John Shawe-Taylor,et al.  String Kernels, Fisher Kernels and Finite State Automata , 2002, NIPS.

[370]  Bo Thiesson,et al.  Staged Mixture Modelling and Boosting , 2002, UAI.

[371]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[372]  Darren J. Wilkinson,et al.  Conditional simulation from highly structured Gaussian systems, with application to blocking-MCMC for the Bayesian analysis of very large linear models , 2002, Stat. Comput..

[373]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing , 2005 .

[374]  A. Dawid Influence Diagrams for Causal Modelling and Inference , 2002 .

[375]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[376]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[377]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[378]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[379]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[380]  A. Roverato Hyper Inverse Wishart Distribution for Non-decomposable Graphs and its Application to Bayesian Inference for Gaussian Graphical Models , 2002 .

[381]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[382]  A. Satorra Structural Equation Models with Latent Variables , 2002 .

[383]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[384]  Dan Geiger,et al.  Exact genetic linkage computations for general pedigrees , 2002, ISMB.

[385]  Mark J. F. Gales Maximum likelihood multiple subspace projections for hidden Markov models , 2002, IEEE Trans. Speech Audio Process..

[386]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002 .

[387]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[388]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[389]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[390]  William T. Freeman,et al.  Nonparametric belief propagation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[391]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[392]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[393]  Mark A. Paskin,et al.  Thin Junction Tree Filters for Simultaneous Localization and Mapping , 2002, IJCAI.

[394]  David Haussler,et al.  Combining phylogenetic and hidden Markov models in biosequence analysis , 2003, RECOMB '03.

[395]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[396]  Yuan Qi,et al.  Tree-structured Approximations by Expectation Propagation , 2003, NIPS.

[397]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[398]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[399]  Jong-Hoon Ahn,et al.  A Constrained EM Algorithm for Principal Component Analysis , 2003, Neural Computation.

[400]  Johann Schumann,et al.  Under Consideration for Publication in J. Functional Programming Autobayes: a System for Generating Data Analysis Programs from Statistical Models , 2022 .

[401]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[402]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[403]  Brendan J. Frey,et al.  Extending Factor Graphs so as to Unify Directed and Undirected Graphical Models , 2002, UAI.

[404]  Ruslan Salakhutdinov,et al.  Adaptive Overrelaxed Bound Optimization Methods , 2003, ICML.

[405]  G. Roberts,et al.  Bayesian Inference For Nondecomposable Graphical Gaussian Models. , 2003 .

[406]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[407]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[408]  Michael Isard,et al.  PAMPAS: real-valued graphical models for computer vision , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[409]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[410]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[411]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[412]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[413]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[414]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[415]  Peter Carbonetto Unsupervised Statistical Models for General Object Recognition , 2003 .

[416]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[417]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[418]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[419]  D. Heckerman,et al.  Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[420]  N. Sloane,et al.  Acyclic Digraphs and Eigenvalues of O,1 Matrices , 2003, math/0310423.

[421]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[422]  Y. Guédon Estimating Hidden Semi-Markov Chains From Discrete Sequences , 2003 .

[423]  Claus Dethlefsen,et al.  deal: A Package for Learning Bayesian Networks , 2003 .

[424]  Christopher K. I. Williams,et al.  An isotropic Gaussian mixture can have more modes than components , 2003 .

[425]  Alexa T. McCray,et al.  An Upper-Level Ontology for the Biomedical Domain , 2003, Comparative and functional genomics.

[426]  R. Kohn,et al.  Efficient estimation of covariance selection models , 2003 .

[427]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..

[428]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[429]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[430]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003, AISTATS.

[431]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[432]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[433]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[434]  Edward C. Chao,et al.  Generalized Estimating Equations , 2003, Technometrics.

[435]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[436]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[437]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[438]  Kunihiko Fukushima,et al.  Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.

[439]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[440]  Michael A. West,et al.  Archival Version including Appendicies : Experiments in Stochastic Computation for High-Dimensional Graphical Models , 2005 .

[441]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[442]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[443]  Mu Zhu The Counter-intuitive Non-informative Prior for the Bernoulli Family , 2004 .

[444]  C. Robert,et al.  Optimal Sample Size for Multiple Testing : the Case of Gene Expression Mi roarraysPeter , 2004 .

[445]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[446]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[447]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[448]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[449]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[450]  David Heckerman,et al.  Probabilistic Models for Relational Data , 2004 .

[451]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[452]  Juha Karhunen,et al.  Accelerating Cyclic Update Algorithms for Parameter Estimation by Pattern Searches , 2003, Neural Processing Letters.

[453]  Larry S. Davis,et al.  Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.

[454]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[455]  Refik Soyer,et al.  Bayesian Methods for Nonlinear Classification and Regression , 2004, Technometrics.

[456]  Jeff A. Bilmes,et al.  Graphical models and automatic speech recognition , 2002 .

[457]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[458]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[459]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[460]  Aleks Jakulin,et al.  Applying Discrete PCA in Data Analysis , 2004, UAI.

[461]  Leslie Pack Kaelbling,et al.  Representing hierarchical POMDPs as DBNs for multi-scale robot localization , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[462]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[463]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[464]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[465]  Robert D. Nowak,et al.  Likelihood based hierarchical clustering , 2004, IEEE Transactions on Signal Processing.

[466]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[467]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[468]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[469]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[470]  Jay K. Dow,et al.  Multinomial probit and multinomial logit: a comparison of choice models for voting research , 2004 .

[471]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[472]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[473]  Antti Honkela,et al.  Variational learning and bits-back coding: an information-theoretic view to Bayesian learning , 2004, IEEE Transactions on Neural Networks.

[474]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[475]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[476]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[477]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[478]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[479]  M. Pourahmadi Simultaneous Modelling of Covariance Matrices : GLM , Bayesian and Nonparametric Perspectives , 2004 .

[480]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[481]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[482]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[483]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[484]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[485]  Larry Wasserman,et al.  All of Statistics , 2004 .

[486]  Nando de Freitas,et al.  Diagnosis by a waiter and a Mars explorer , 2004, Proceedings of the IEEE.

[487]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[488]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[489]  E. Learned-Miller Hyperspacings and the estimation of information theoretic quantities , 2004 .

[490]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[491]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[492]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[493]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[494]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[495]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[496]  Cleve B. Moler,et al.  Numerical computing with MATLAB , 2004 .

[497]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[498]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[499]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[500]  Sebastian Thrun,et al.  FastSLAM: An Efficient Solution to the Simultaneous Localization And Mapping Problem with Unknown Data , 2004 .

[501]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[502]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[503]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[504]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[505]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[506]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[507]  Paul Fearnhead,et al.  Exact Bayesian curve fitting and signal segmentation , 2005, IEEE Transactions on Signal Processing.

[508]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[509]  Simon Rogers,et al.  Hierarchic Bayesian models for kernel learning , 2005, ICML.

[510]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[511]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[512]  Melvin J. Hinich,et al.  Time Series Analysis by State Space Methods , 2001 .

[513]  Yuan Qi,et al.  Bayesian Conditional Random Fields , 2005, AISTATS.

[514]  J. Tenenbaum,et al.  Structure and strength in causal induction , 2005, Cognitive Psychology.

[515]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[516]  Aleks Jakulin,et al.  Discrete Component Analysis , 2005, SLSFS.

[517]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[518]  A. Atay-Kayis,et al.  A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models , 2005 .

[519]  D. Pe’er Bayesian Network Analysis of Signaling Networks: A Primer , 2005, Science's STKE.

[520]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[521]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[522]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[523]  Samuel Kaski,et al.  On Discriminative Joint Density Modeling , 2005, ECML.

[524]  M. T. Johnson,et al.  Capacity and complexity of HMM duration modeling techniques , 2005, IEEE Signal Processing Letters.

[525]  Max Welling,et al.  Learning in Markov Random Fields An Empirical Study , 2005 .

[526]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[527]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[528]  Rajat Raina,et al.  Abstract , 1997, Veterinary Record.

[529]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[530]  Yee Whye Teh,et al.  Structured Region Graphs: Morphing EP into GBP , 2005, UAI.

[531]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[532]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[533]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[534]  Zoubin Ghahramani,et al.  A note on the evidence and Bayesian Occam's razor , 2005 .

[535]  E. Fokoue,et al.  Mixtures of factor analyzers: an extension with covariates , 2005 .

[536]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[537]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[538]  Eric Horvitz,et al.  Prediction, Expectation, and Surprise: Methods, Designs, and Study of a Deployed Traffic Forecasting Service , 2005, UAI.

[539]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[540]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[541]  Dmitry M. Malioutov,et al.  Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation , 2005, NIPS.

[542]  Max Welling,et al.  Learning in Markov Random Fields with Contrastive Free Energies , 2005, AISTATS.

[543]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[544]  Charles Elkan,et al.  Deriving TF-IDF as a Fisher Kernel , 2005, SPIRE.

[545]  David Kauchak,et al.  Modeling word burstiness using the Dirichlet distribution , 2005, ICML.

[546]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[547]  Yair Weiss,et al.  Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[548]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[549]  Sheila A. McIlraith,et al.  Partition-based logical reasoning for first-order and propositional theories , 2005, Artif. Intell..

[550]  Ian H. Jermyn Invariant Bayesian estimation on manifolds , 2005, The Annals of Statistics.

[551]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[552]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[553]  Radford M. Neal,et al.  High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees , 2006, Feature Extraction.

[554]  Vikash K. Mansinghka,et al.  Learning Cross-cutting Systems of Categories , 2006 .

[555]  David Barber,et al.  Unified Inference for Variational Bayesian Linear Gaussian State-Space Models , 2006, NIPS.

[556]  Yuan Qi,et al.  Parameter Expanded Variational Bayesian Methods , 2006, NIPS.

[557]  Manfred Opper,et al.  A Bayesian Approach to Online Learning , 2006 .

[558]  Matthew J. Beal,et al.  Variational Bayesian learning of directed graphical models with hidden variables , 2006 .

[559]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[560]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[561]  Daniel Tarlow,et al.  Using Combinatorial Optimization within Max-Product Belief Propagation , 2006, NIPS.

[562]  Michael I. Jordan,et al.  Nonparametric empirical Bayes for the Dirichlet process mixture model , 2006, Stat. Comput..

[563]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[564]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[565]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[566]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[567]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[568]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[569]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[570]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[571]  Hilbert J. Kappen,et al.  The Cluster Variation Method for Efficient Linkage Analysis on Extended Pedigrees , 2006, BMC Bioinformatics.

[572]  Shunzheng Yu,et al.  Practical implementation of an efficient forward-backward algorithm for an explicit-duration hidden Markov model , 2006, IEEE Transactions on Signal Processing.

[573]  Frank Dellaert,et al.  MCMC Data Association and Sparse Factorization Updating for Real Time Multitarget Tracking with Merged and Multiple Measurements , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[574]  Edward I. George,et al.  Bayesian Ensemble Learning , 2006, NIPS.

[575]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[576]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[577]  Andrew Fitzgibbon,et al.  Gaussian Process Implicit Surfaces , 2006 .

[578]  Mikko Koivisto,et al.  Advances in Exact Bayesian Structure Discovery in Bayesian Networks , 2006, UAI.

[579]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[580]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[581]  P. Bühlmann,et al.  Sparse Boosting , 2006, J. Mach. Learn. Res..

[582]  M. West,et al.  The use of unlabelled data in predictive modelling , 2006 .

[583]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[584]  Ian McGraw,et al.  Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing , 2006, UAI.

[585]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[586]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.

[587]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[588]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[589]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[590]  Pam Frost Gorder Neural Networks Show New Promise for Machine Vision , 2006, Computing in Science & Engineering.

[591]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[592]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[593]  Dmitry M. Malioutov,et al.  Walk-Sums and Belief Propagation in Gaussian Graphical Models , 2006, J. Mach. Learn. Res..

[594]  Erik B. Sudderth Graphical models for visual object recognition and tracking , 2006 .

[595]  Sébastien Roch,et al.  A short proof that phylogenetic tree reconstruction by maximum likelihood is hard , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[596]  Milos Hauskrecht,et al.  Noisy-OR Component Analysis and its Application to Link Analysis , 2006, J. Mach. Learn. Res..

[597]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[598]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[599]  Kurt Bryan,et al.  The $25,000,000,000 Eigenvector: The Linear Algebra behind Google , 2006, SIAM Rev..

[600]  Amy Nicole Langville,et al.  Updating Markov Chains with an Eye on Google's PageRank , 2005, SIAM J. Matrix Anal. Appl..

[601]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[602]  Mu Zhu,et al.  Automatic dimensionality selection from the scree plot via the use of profile likelihood , 2006, Comput. Stat. Data Anal..

[603]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[604]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[605]  Jing Yu,et al.  Computational Inference of Neural Information Flow Networks , 2006, PLoS Comput. Biol..

[606]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[607]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[608]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[609]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[610]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .

[611]  Gareth O. Roberts,et al.  Robust Markov chain Monte Carlo Methods for Spatial Generalized Linear Mixed Models , 2006 .

[612]  David Barber,et al.  Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems , 2006, J. Mach. Learn. Res..

[613]  S. Harnad Symbol grounding problem , 1991, Scholarpedia.

[614]  Max Welling Donald,et al.  Products of Experts , 2007 .

[615]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[616]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[617]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[618]  荒木 望 Unscented Kalman Filterの計測への応用に関する研究 , 2007 .

[619]  Kristine L. Bell,et al.  A Tutorial on Particle Filters for Online Nonlinear/NonGaussian Bayesian Tracking , 2007 .

[620]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[621]  Hal Daumé,et al.  Fast search for Dirichlet process mixture models , 2007, AISTATS.

[622]  Daniel M. Roy,et al.  AClass : An online algorithm for generative classification , 2007 .

[623]  Erik B. Sudderth,et al.  Loop Series and Bethe Variational Bounds in Attractive Graphical Models , 2007, NIPS.

[624]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[625]  Deepak Agarwal,et al.  Predictive discrete latent factor models for large scale dyadic data , 2007, KDD '07.

[626]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[627]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[628]  Eric Moulines,et al.  Online EM Algorithm for Latent Data Models , 2007, ArXiv.

[629]  Brendan J. Frey,et al.  Bayesian Inference of MicroRNA Targets from Sequence and Expression Data , 2007, J. Comput. Biol..

[630]  Persi Diaconis,et al.  c ○ 2007 Society for Industrial and Applied Mathematics Dynamical Bias in the Coin Toss ∗ , 2022 .

[631]  Andrew McCallum,et al.  Improved Dynamic Schedules for Belief Propagation , 2007, UAI.

[632]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[633]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[634]  David B. Dunson,et al.  Bayesian Structural Equation Modeling , 2007 .

[635]  Adrian E. Raftery,et al.  Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[636]  Byron Boots,et al.  A Constraint Generation Approach to Learning Stable Linear Dynamical Systems , 2007, NIPS.

[637]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[638]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[639]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[640]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[641]  Stephen P. Brooks,et al.  Assessing Convergence of Markov Chain Monte Carlo Algorithms , 2007 .

[642]  David B. Dunson,et al.  Multi-Task Compressive Sensing , 2007 .

[643]  Amr Ahmed,et al.  On Tight Approximate Inference of the Logistic-Normal Topic Admixture Model , 2007 .

[644]  Michael A. West,et al.  Dynamic matrix-variate graphical models , 2007 .

[645]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[646]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[647]  Chong Wang,et al.  Variational Bayesian Approach to Canonical Correlation Analysis , 2007, IEEE Transactions on Neural Networks.

[648]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[649]  Hans-Peter Kriegel,et al.  Fast Inference in Infinite Hidden Relational Models , 2007, MLG.

[650]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[651]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[652]  J. Griffin,et al.  Bayesian adaptive lassos with non-convex penalization , 2007 .

[653]  P. Druilhet,et al.  Invariant HPD credible sets and MAP estimators , 2007 .

[654]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[655]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[656]  T. Blumensath,et al.  On the Difference Between Orthogonal Matching Pursuit and Orthogonal Least Squares , 2007 .

[657]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[658]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[659]  David P. Wipf,et al.  A New View of Automatic Relevance Determination , 2007, NIPS.

[660]  Jean-Michel Marin,et al.  Bayesian Core: A Practical Approach to Computational Bayesian Statistics , 2010 .

[661]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[662]  Kevin P. Murphy,et al.  Exact Bayesian structure learning from uncertain interventions , 2007, AISTATS.

[663]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[664]  J. Weston,et al.  Approximation Methods for Gaussian Process Regression , 2007 .

[665]  O. Zoeter Bayesian Generalized Linear Models in a Terabyte World , 2007, 2007 5th International Symposium on Image and Signal Processing and Analysis.

[666]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[667]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[668]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[669]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[670]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[671]  Yair Weiss,et al.  Minimizing and Learning Energy Functions for Side-Chain Prediction , 2007, RECOMB.

[672]  G. Obozinski Joint covariate selection for grouped classification , 2007 .

[673]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[674]  S. Fienberg,et al.  DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA. , 2007, The annals of applied statistics.

[675]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[676]  Yuzo Maruyama,et al.  A g-prior extension for p>n , 2008 .

[677]  James G. Scott,et al.  Feature-Inclusion Stochastic Search for Gaussian Graphical Models , 2008 .

[678]  Yiannis Aloimonos,et al.  Who killed the directed model? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[679]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[680]  Kristian Kersting,et al.  Social Network Mining with Nonparametric Relational Models , 2008, SNAKDD.

[681]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[682]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[683]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[684]  Tamir Hazan,et al.  Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies , 2008, UAI.

[685]  Benjamin M. Marlin,et al.  Missing Data Problems in Machine Learning , 2008 .

[686]  Yang Xu,et al.  R/BHC: fast Bayesian hierarchical clustering for microarray data , 2009, BMC Bioinformatics.

[687]  Zoubin Ghahramani,et al.  Probabilistic models for data combination in recommender systems , 2008 .

[688]  David Maxwell Chickering,et al.  Here or there: preference judgments for relevance , 2008 .

[689]  M. Sahani,et al.  Counterexamples to variational free energy compactness folk theorems , 2008 .

[690]  Max Welling,et al.  Deterministic Latent Variable Models and Their Pitfalls , 2008, SDM.

[691]  Francis R. Bach,et al.  Sparse probabilistic projections , 2008, NIPS.

[692]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[693]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[694]  Mark W. Schmidt,et al.  Structure learning in random fields for heart motion abnormality detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[695]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[696]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[697]  D. Lizotte Practical bayesian optimization , 2008 .

[698]  C. Lawrence,et al.  Centroid estimation in discrete high-dimensional spaces with applications in biology , 2008, Proceedings of the National Academy of Sciences.

[699]  Michael I. Jordan,et al.  An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[700]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[701]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[702]  Stephen Gould,et al.  Projected Subgradient Methods for Learning Sparse Gaussians , 2008, UAI.

[703]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[704]  Danny Bickson,et al.  Gaussian Belief Propagation: Theory and Aplication , 2008, 0811.2518.

[705]  Arnaud Doucet,et al.  Sparse Bayesian nonparametric regression , 2008, ICML '08.

[706]  Matthias W. Seeger,et al.  Compressed sensing and Bayesian experimental design , 2008, ICML '08.

[707]  Katherine A. Heller,et al.  Bayesian Exponential Family PCA , 2008, NIPS.

[708]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[709]  Yangbo He,et al.  Active Learning of Causal Networks with Intervention Experiments and Optimal Designs , 2008 .

[710]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[711]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[712]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[713]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[714]  N. Meinshausen A note on the Lasso for Gaussian graphical model selection , 2008 .

[715]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[716]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[717]  B. Moghaddam,et al.  Sparse regression as a sparse eigenvalue problem , 2008, 2008 Information Theory and Applications Workshop.

[718]  A. Lenkoski Bayesian structural learning and estimation in Gaussian graphical models , 2008 .

[719]  Stephen Gould,et al.  Learning Bounded Treewidth Bayesian Networks , 2008, NIPS.

[720]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[721]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[722]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.

[723]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[724]  Jianhua Zhao,et al.  Fast ML Estimation for the Mixture of Factor Analyzers via an ECM Algorithm , 2008, IEEE Transactions on Neural Networks.

[725]  Yuhong Guo,et al.  Supervised Exponential Family Principal Component Analysis via Convex Optimization , 2008, NIPS.

[726]  Samuel Kaski,et al.  Probabilistic approach to detecting dependencies between data sets , 2008, Neurocomputing.

[727]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[728]  Cristian Sminchisescu,et al.  Fast algorithms for large scale conditional 3D prediction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[729]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[730]  A. Dobra Dependency networks for genome-wide data , 2008 .

[731]  Xinsheng Liu,et al.  The EM algorithm for the extended finite mixture of the factor analyzers model , 2008, Comput. Stat. Data Anal..

[732]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[733]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[734]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[735]  Pushmeet Kohli,et al.  Minimizing sparse higher order energy functions of discrete variables , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[736]  S. Khudanpur,et al.  On projections of Gaussian distributions using maximum likelihood criteria , 2009, 2009 Information Theory and Applications Workshop.

[737]  Alan L. Yuille,et al.  Compositional noisy-logical learning , 2009, ICML '09.

[738]  Chris Hans Bayesian lasso regression , 2009 .

[739]  M. Heaton Bayesian Computation and the Linear Model , 2009 .

[740]  Giovanni Petris,et al.  Dynamic Linear Models with R , 2009 .

[741]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[742]  Brendan J. Frey,et al.  A Binary Variable Model for Affinity Propagation , 2009, Neural Computation.

[743]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[744]  S. V. N. Vishwanathan,et al.  Variable Metric Stochastic Approximation Theory , 2009, AISTATS.

[745]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[746]  Philip Schniter,et al.  Fast Bayesian Matching Pursuit: Model Uncertainty and Parameter Estimation for Sparse Linear Models , 2009 .

[747]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[748]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[749]  Michael I. Jordan,et al.  Optimization of Structured Mean Field Objectives , 2009, UAI.

[750]  Holger Hoefling A Path Algorithm for the Fused Lasso Signal Approximator , 2009, 0910.0526.

[751]  Roman Barták,et al.  Constraint Processing , 2009, Encyclopedia of Artificial Intelligence.

[752]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[753]  Ben Calderhead,et al.  Riemannian Manifold Hamiltonian Monte Carlo , 2009, 0907.1100.

[754]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[755]  Peter D. Hoff,et al.  A First Course in Bayesian Statistical Methods , 2009 .

[756]  William T. Freeman,et al.  Informative Sensing , 2009, ArXiv.

[757]  Joseph Sill,et al.  Feature-Weighted Linear Stacking , 2009, ArXiv.

[758]  Steffen Staab,et al.  Explicit Versus Latent Concept Models for Cross-Language Information Retrieval , 2009, IJCAI.

[759]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[760]  Jieping Ye,et al.  Finite Domain Constraint Solver Learning , 2009, IJCAI.

[761]  Chi Ho Lo Statistical methods for high throughput genomics , 2009 .

[762]  P. Kuan,et al.  A Hierarchical Semi-Markov Model for Detecting Enrichment with Application to ChIP-Seq Experiments , 2009 .

[763]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[764]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[765]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[766]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[767]  Thore Graepel,et al.  Matchbox: Large Scale Bayesian Recommendations , 2009 .

[768]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[769]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[770]  Mohammad Emtiyaz Khan,et al.  Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models , 2009, NIPS.

[771]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[772]  Yoram Singer,et al.  Boosting with structural sparsity , 2009, ICML '09.

[773]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[774]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[775]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[776]  David Edwards,et al.  Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests , 2010, BMC Bioinformatics.

[777]  Yee Whye Teh,et al.  A stochastic memoizer for sequence data , 2009, ICML '09.

[778]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[779]  A. Doucet,et al.  Smoothing algorithms for state–space models , 2010 .

[780]  Guillermo Sapiro,et al.  Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[781]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[782]  Patrick Gallinari,et al.  Ranking with ordered weighted pairwise classification , 2009, ICML '09.

[783]  Gm Gero Walter,et al.  Bayesian linear regression , 2009 .

[784]  Robert Tibshirani,et al.  Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods , 2009, J. Mach. Learn. Res..

[785]  Michael Elad,et al.  A Plurality of Sparse Representations Is Better Than the Sparsest One Alone , 2009, IEEE Transactions on Information Theory.

[786]  Alexander J. Smola,et al.  COLLABORATIVE SPAM FILTERING WITH THE HASHING TRICK , 2009 .

[787]  David Barber,et al.  A Simple Alternative Derivation of the Expectation Correction Algorithm , 2009, IEEE Signal Processing Letters.

[788]  Mark W. Schmidt,et al.  Modeling Discrete Interventional Data using Directed Cyclic Graphical Models , 2009, UAI.

[789]  Magnus Rattray,et al.  Inference algorithms and learning theory for Bayesian sparse factor analysis , 2009 .

[790]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[791]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[792]  David J. Lunn,et al.  Generic reversible jump MCMC using graphical models , 2009, Stat. Comput..

[793]  Volkan Cevher,et al.  Learning with Compressible Priors , 2009, NIPS.

[794]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[795]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[796]  Helen Armstrong,et al.  Bayesian covariance matrix estimation using a mixture of decomposable graphical models , 2007, Stat. Comput..

[797]  M. Wand SEMIPARAMETRIC REGRESSION AND GRAPHICAL MODELS , 2009 .

[798]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[799]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[800]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[801]  O. Zobay Mean field inference for the Dirichlet process mixture model , 2009 .

[802]  S. Shankar Sastry,et al.  Markov Chain Monte Carlo Data Association for Multi-Target Tracking , 2009, IEEE Transactions on Automatic Control.

[803]  Hal Daumé,et al.  Multi-Label Prediction via Sparse Infinite CCA , 2009, NIPS.

[804]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[805]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[806]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[807]  Mark W. Schmidt Optimization Methods for ` 1-Regularization , 2009 .

[808]  Dafna Shahaf,et al.  Learning Thin Junction Trees via Graph Cuts , 2009, AISTATS.

[809]  Joshua B Tenenbaum,et al.  Theory-based causal induction. , 2009, Psychological review.

[810]  H. Kawakatsu,et al.  EM Algorithms for Ordered Probit Models with Endogenous Regressors , 2009 .

[811]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[812]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[813]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[814]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[815]  Junyong Park,et al.  Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification , 2009, J. Mach. Learn. Res..

[816]  Alun Thomas,et al.  Enumerating the decomposable neighbors of a decomposable graph under a simple perturbation scheme , 2009, Comput. Stat. Data Anal..

[817]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[818]  Richard S. Zemel,et al.  Collaborative prediction and ranking with non-random missing data , 2009, RecSys '09.

[819]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[820]  Sorin Lerner,et al.  Latent Variable Models for Predicting File Dependencies in Large-Scale Software Development , 2010, NIPS.

[821]  Kian Ming Adam Chai,et al.  Multi-task learning with Gaussian processes , 2010 .

[822]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[823]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[824]  Nando de Freitas,et al.  Sparsity priors and boosting for learning localized distributed feature representations , 2010 .

[825]  Ryo Yoshida,et al.  Bayesian Learning in Sparse Graphical Factor Models via Annealed Entropy , 2010 .

[826]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[827]  Allen Y. Yang,et al.  Fast ℓ1-minimization algorithms and an application in robust face recognition: A review , 2010, 2010 IEEE International Conference on Image Processing.

[828]  J. Hacker,et al.  Winner-Take-All Politics: How Washington Made the Rich Richer--and Turned Its Back on the Middle Class , 2010 .

[829]  D. Sontag 1 Introduction to Dual Decomposition for Inference , 2010 .

[830]  Elchanan Mossel,et al.  The Computational Complexity of Estimating Convergence Time , 2010, ArXiv.

[831]  Eric P. Xing,et al.  Conditional Topic Random Fields , 2010, ICML.

[832]  Robert F. Harrison,et al.  A sparse multinomial probit model for classification , 2011, Pattern Analysis and Applications.

[833]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[834]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[835]  Ben Taskar,et al.  Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[836]  David P. Wipf,et al.  Iterative Reweighted 1 and 2 Methods for Finding Sparse Solutions , 2010, IEEE J. Sel. Top. Signal Process..

[837]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[838]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[839]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[840]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[841]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[842]  Samuel Kaski,et al.  Bayesian exponential family projections for coupled data sources , 2010, UAI.

[843]  By W. R. GILKSt,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 2010 .

[844]  Patrick Gallinari,et al.  Erratum: SGDQN is Less Careful than Expected , 2010, J. Mach. Learn. Res..

[845]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[846]  Zoubin Ghahramani,et al.  Variational Inference for Nonparametric Multiple Clustering , 2010 .

[847]  Hisayuki Hara,et al.  A Localization Approach to Improve Iterative Proportional Scaling in Gaussian Graphical Models , 2008, 0802.2581.

[848]  Stefan Schaal,et al.  Efficient Learning and Feature Selection in High-Dimensional Regression , 2010, Neural Computation.

[849]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[850]  Bo Chen,et al.  Deep Learning of Invariant Spatio-Temporal Features from Video , 2010 .

[851]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[852]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[853]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[854]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[855]  Xinhua Zhang,et al.  Bayesian Online Learning for Multi-label and Multi-variate Performance Measures , 2010, AISTATS.

[856]  Meritxell Vinyals,et al.  Worst-case bounds on the quality of max-product fixed-points , 2010, NIPS.

[857]  Samuel Kaski,et al.  Variational Bayesian Mixture of Robust CCA Models , 2010, ECML/PKDD.

[858]  Bernt Schiele,et al.  Automatic discovery of meaningful object parts with latent CRFs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[859]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[860]  A. Doucet,et al.  A Hierarchical Bayesian Framework for Constructing Sparsity-inducing Priors , 2010, 1009.1914.

[861]  Mohammad Emtiyaz Khan,et al.  Variational bounds for mixed-data factor analysis , 2010, NIPS.

[862]  Ryan P. Adams,et al.  Learning the Structure of Deep Sparse Graphical Models , 2009, AISTATS.

[863]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[864]  Arindam Banerjee,et al.  Residual Bayesian Co-clustering for Matrix Approximation , 2010, SDM.

[865]  J. Tenenbaum,et al.  A probabilistic model of theory formation , 2010, Cognition.

[866]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[867]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[868]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[869]  J. Vanhatalo,et al.  Approximate inference for disease mapping with sparse Gaussian processes , 2010, Statistics in medicine.

[870]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[871]  Pedro M. Domingos,et al.  Learning Efficient Markov Networks , 2010, NIPS.

[872]  Tom M. Mitchell,et al.  Learning to Tag from Open Vocabulary Labels , 2010, ECML/PKDD.

[873]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[874]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[875]  Xi Chen,et al.  Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso , 2010, ArXiv.

[876]  Jieping Ye,et al.  A shared-subspace learning framework for multi-label classification , 2010, TKDD.

[877]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[878]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[879]  J. Vanhatalo SPEEDING UP THE INFERENCE IN GAUSSIAN PROCESS MODELS , 2010 .

[880]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[881]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[882]  Ying Cui,et al.  Learning multiple nonredundant clusterings , 2010, TKDD.

[883]  S. Frühwirth-Schnatter,et al.  Data Augmentation and MCMC for Binary and Multinomial Logit Models , 2010 .

[884]  H. Massam,et al.  The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors , 2010 .

[885]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[886]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[887]  Alexander M. Rush,et al.  Dual Decomposition for Parsing with Non-Projective Head Automata , 2010, EMNLP.

[888]  Arthur M. Geoffrion,et al.  Lagrangian Relaxation for Integer Programming , 2010, 50 Years of Integer Programming.

[889]  Nando de Freitas,et al.  A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets , 2010, 2010 Information Theory and Applications Workshop (ITA).

[890]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[891]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[892]  Zoubin Ghahramani,et al.  Approximate inference for the loss-calibrated Bayesian , 2011, AISTATS.

[893]  A. Fraser Hidden Markov Models and Dynamical Systems , 2011 .

[894]  Mikko Koivisto,et al.  Ancestor Relations in the Presence of Unobserved Variables , 2011, ECML/PKDD.

[895]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[896]  Sharon Bertsch McGrayne,et al.  The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy , 2011 .

[897]  Susan T. Dumais,et al.  Partially labeled topic models for interpretable text mining , 2011, KDD.

[898]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[899]  K. Kersting,et al.  Statistical Relational AI : Logic , Probability and Computation , 2011 .

[900]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[901]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[902]  T. Speed A Correlation for the 21st Century , 2011, Science.

[903]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[904]  J. Grossman The Likelihood Principle , 2011 .

[905]  Antonio Torralba,et al.  Trees and beyond: exploiting and improving tree-structured graphical models , 2011 .

[906]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[907]  Charles Soussen,et al.  From Bernoulli–Gaussian Deconvolution to Sparse Signal Restoration , 2011, IEEE Transactions on Signal Processing.

[908]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[909]  Steven Levy,et al.  In the Plex: How Google Thinks, Works, and Shapes Our Lives , 2011 .

[910]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[911]  Olivier Capp'e,et al.  Online Expectation Maximisation , 2010, 1011.1745.

[912]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[913]  Mohammad Emtiyaz Khan,et al.  Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models , 2011, ICML.

[914]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[915]  Geoffrey E. Hinton,et al.  Using very deep autoencoders for content-based image retrieval , 2011, ESANN.

[916]  M. Wand,et al.  Mean field variational bayes for elaborate distributions , 2011 .

[917]  Marshall F Chalverus,et al.  The Black Swan: The Impact of the Highly Improbable , 2007 .

[918]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[919]  Shuang-Hong Yang,et al.  Collaborative competitive filtering: learning recommender using context of user choice , 2011, SIGIR.

[920]  Bernhard Schölkopf,et al.  Support Vector Machines as Probabilistic Models , 2011, ICML.

[921]  Christopher K. I. Williams,et al.  Greedy Learning of Binary Latent Trees , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[922]  Ben Taskar,et al.  Learning Determinantal Point Processes , 2011, UAI.

[923]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[924]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[925]  Joshua B. Tenenbaum,et al.  Learning to Learn with Compound HD Models , 2011, NIPS.

[926]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[927]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[928]  E. Wagenmakers,et al.  Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). , 2011, Journal of personality and social psychology.

[929]  S. L. Scott Data augmentation, frequentist estimation, and the Bayesian analysis of multinomial logit models , 2011 .

[930]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.

[931]  Alan L. Yuille,et al.  Probabilistic models of vision and max-margin methods , 2012 .

[932]  D. Dunson,et al.  Simplex Factor Models for Multivariate Unordered Categorical Data , 2012, Journal of the American Statistical Association.

[933]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[934]  Christian P. Robert,et al.  Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction , 2012 .

[935]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[936]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[937]  Simon J. D. Prince,et al.  Computer Vision: Models, Learning, and Inference , 2012 .

[938]  Hassan S. Bakouch Time series: Modeling, Computation, and Inference , 2012 .

[939]  Neil D. Lawrence,et al.  A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models , 2010, J. Mach. Learn. Res..

[940]  T. Minka Estimating a Dirichlet distribution , 2012 .

[941]  A. Doucet,et al.  Efficient Bayesian Inference for Multivariate Probit Models With Sparse Inverse Correlation Matrices , 2012 .

[942]  Jascha Sohl-Dickstein,et al.  Efficient and optimal binary Hopfield associative memory storage using minimum probability flow , 2012, 1204.2916.

[943]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[944]  Rich Caruana,et al.  A Dozen Tricks with Multitask Learning , 1996, Neural Networks: Tricks of the Trade.

[945]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[946]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[947]  Uffe Kjærulff,et al.  Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis , 2007, Information Science and Statistics.

[948]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[949]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[950]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.

[951]  Sebastian Tschiatschek,et al.  Introduction to Probabilistic Graphical Models , 2014 .

[952]  Tyler Cymet,et al.  The era of big data. , 2014, Maryland medicine : MM : a publication of MEDCHI, the Maryland State Medical Society.

[953]  Weiqiang Dong On Bias , Variance , 0 / 1-Loss , and the Curse of Dimensionality RK April 13 , 2014 .

[954]  Din J. Wasem Mining of Massive Datasets , 2014 .

[955]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[956]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[957]  Maurizio Dapor Monte Carlo Strategies , 2020, Transport of Energetic Electrons in Solids.

[958]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..