Big Learning with Bayesian Methods

Explosive growth in data and availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems, and applications with Big Data. Bayesian methods represent one important class of statistic methods for machine learning, with substantial recent developments on adaptive, flexible and scalable Bayesian learning. This article provides a survey of the recent advances in Big learning with Bayesian methods, termed Big Bayesian Learning, including nonparametric Bayesian methods for adaptively inferring model complexity, regularized Bayesian inference for improving the flexibility via posterior regularization, and scalable algorithms and systems based on stochastic subsampling and distributed computing for dealing with large-scale applications.

[1]  Bo Zhang,et al.  Improved Bayesian Logistic Supervised Topic Models with Data Augmentation , 2013, ACL.

[2]  D. Blei Bayesian Nonparametrics I , 2016 .

[3]  Naftali Tishby,et al.  Predictability, Complexity, and Learning , 2000, Neural Computation.

[4]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[5]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[6]  A. Alexandrova The British Journal for the Philosophy of Science , 1965, Nature.

[7]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[8]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[9]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[10]  W. R. Schucany,et al.  Handbook of Parallel Computing and Statistics , 2008 .

[11]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[12]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[13]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[14]  Bo Zhang,et al.  Scalable inference in max-margin topic models , 2013, KDD.

[15]  Zoubin Ghahramani,et al.  Bayesian non-parametrics and the probabilistic approach to modelling , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[16]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[17]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[18]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[20]  Xiangyu Wang,et al.  Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[21]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[22]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[23]  Christian P Robert,et al.  Lack of confidence in approximate Bayesian computation model choice , 2011, Proceedings of the National Academy of Sciences.

[24]  Cory Doctorow Big data: Welcome to the petacentre , 2008, Nature.

[25]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[26]  Ryan P. Adams,et al.  Learning the Structure of Deep Sparse Graphical Models , 2009, AISTATS.

[27]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[28]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[29]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[30]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[31]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[32]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[33]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[34]  Eric P. Xing,et al.  Model-Parallel Inference for Big Topic Models , 2014, ArXiv.

[35]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[36]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[37]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[38]  Zoubin Ghahramani,et al.  Dependent Indian Buffet Processes , 2010, AISTATS.

[39]  Yang Gao,et al.  Towards Topic Modeling for Big Data , 2014, ArXiv.

[40]  Max Welling,et al.  Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.

[41]  Brandon M. Turner,et al.  A tutorial on approximate Bayesian computation , 2012 .

[42]  David B. Dunson,et al.  Scalable and Robust Bayesian Inference via the Median Posterior , 2014, ICML.

[43]  Yee Whye Teh,et al.  Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex , 2013, NIPS.

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[46]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[47]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[48]  Ingvar Strid Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach , 2010, Comput. Stat. Data Anal..

[49]  Yee Whye Teh,et al.  Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[50]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[51]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[52]  Chong Wang,et al.  An Adaptive Learning Rate for Stochastic Variational Inference , 2013, ICML.

[53]  Zoubin Ghahramani,et al.  Pitfalls in the use of Parallel Inference for the Dirichlet Process , 2014, ICML.

[54]  Oluwasanmi Koyejo,et al.  Constrained Bayesian Inference for Low Rank Multitask Learning , 2013, UAI.

[55]  David B. Dunson,et al.  Bayesian Conditional Density Filtering , 2014, Journal of Computational and Graphical Statistics.

[56]  Ke Jiang,et al.  Small-Variance Asymptotics for Hidden Markov Models , 2013, NIPS.

[57]  Michael I. Jordan Graphical Models , 1998 .

[58]  Mohammad Emtiyaz Khan,et al.  Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models , 2011, ICML.

[59]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[60]  P. M. Williams Bayesian Conditionalisation and the Principle of Minimum Information , 1980, The British Journal for the Philosophy of Science.

[61]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[62]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[63]  Jun Zhu,et al.  Robust RegBayes: Selectively Incorporating First-Order Logic Domain Knowledge into Bayesian Models , 2014, ICML.

[64]  Jason A. Duan,et al.  Generalized spatial dirichlet process models , 2007 .

[65]  Max Welling,et al.  Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[66]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[67]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[68]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[69]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[70]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71]  Yee Whye Teh,et al.  Distributed Bayesian Posterior Sampling via Moment Sharing , 2014, NIPS.

[72]  Michael I. Jordan,et al.  Optimistic Concurrency Control for Distributed Unsupervised Learning , 2013, NIPS.

[73]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[74]  Anthony Brockwell Parallel Markov chain Monte Carlo Simulation by Pre-Fetching , 2006 .

[75]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[76]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[77]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[78]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[79]  Eric P. Xing,et al.  Parallel Markov Chain Monte Carlo for Nonparametric Mixture Models , 2013, ICML.

[80]  Feng Yan,et al.  Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units , 2009, NIPS.

[81]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[82]  Jon Doyle,et al.  Fast Hamiltonian Monte Carlo Using GPU Computing , 2014, 1402.4089.

[83]  Seunghak Lee,et al.  Petuum: A Framework for Iterative-Convergent Distributed ML , 2013, ArXiv.

[84]  Darren J. Wilkinson,et al.  Parallel Bayesian Computation , 2005 .

[85]  N. Pillai,et al.  Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[86]  Wenguang Chen,et al.  WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation , 2015, Proc. VLDB Endow..

[87]  Ning Chen,et al.  Gibbs Max-Margin Topic Models with Fast Sampling Algorithms , 2013, ICML.

[88]  Bo Zhang,et al.  Scalable Inference for Logistic-Normal Topic Models , 2013, NIPS.

[89]  Jun Zhu,et al.  Distributing the Stochastic Gradient Sampler for Large-Scale LDA , 2016, KDD.

[90]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[91]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[92]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[93]  Bo Zhang,et al.  Max-Margin Infinite Hidden Markov Models , 2014, ICML.

[94]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[95]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[96]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[97]  John Canny,et al.  BIDMach: Large-scale Learning with Zero Memory Allocation , 2013 .

[98]  Ivor W. Tsang,et al.  Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[99]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[100]  John Langford,et al.  Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[101]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[102]  N. Narisetty,et al.  Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[103]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[104]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[105]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[107]  Tim Kraska,et al.  MLbase: A Distributed Machine-learning System , 2013, CIDR.

[108]  Cliburn Chan,et al.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[109]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[110]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[111]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[112]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[113]  David M. Blei,et al.  Efficient Online Inference for Bayesian Nonparametric Relational Models , 2013, NIPS.

[114]  Manfred Opper,et al.  A Bayesian Approach to Online Learning , 2006 .

[115]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[116]  Judith Rousseau,et al.  On Convergence Rates of Empirical Bayes Procedures , 2014 .

[117]  Judith Rousseau,et al.  Bayes and empirical Bayes : Do they merge? , 2012, 1204.1470.

[118]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[119]  Steven Reece,et al.  Automated Machine Learning on Big Data using Stochastic Algorithm Tuning , 2014 .

[120]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[121]  C. Robert,et al.  ABC likelihood-free methods for model choice in Gibbs random fields , 2008, 0807.2767.

[122]  Xiao-Lin Wu,et al.  Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics , 2012, Genetics Selection Evolution.

[123]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[124]  J. Atchison,et al.  Logistic-normal distributions:Some properties and uses , 1980 .

[125]  David M. Blei,et al.  Smoothed Gradients for Stochastic Variational Inference , 2014, NIPS.

[126]  Seunghak Lee,et al.  Primitives for Dynamic Big Model Parallelism , 2014, ArXiv.

[127]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[128]  David B. Dunson,et al.  Bayesian Conditional Density Filtering for Big Data , 2014, ArXiv.

[129]  Max Welling,et al.  Exploiting the Statistics of Learning and Inference , 2014, ArXiv.

[130]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[131]  Eddie Kohler,et al.  Accelerating MCMC via Parallel Predictive Prefetching , 2014, UAI.

[132]  Edwin T. Jaynes Prior Probabilities , 2010, Encyclopedia of Machine Learning.

[133]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[134]  Arthur Gretton,et al.  Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees , 2011, AISTATS.

[135]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[136]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[137]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[138]  Jun Zhu,et al.  Max-Margin Nonparametric Latent Feature Models for Link Prediction , 2012, ICML.

[139]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[140]  Matthew B. Jones,et al.  Challenges and Opportunities of Open Data in Ecology , 2011, Science.

[141]  Babak Shahbaba,et al.  Distributed Stochastic Gradient MCMC , 2014, ICML.

[142]  Michael I. Jordan,et al.  Nonparametric empirical Bayes for the Dirichlet process mixture model , 2006, Stat. Comput..

[143]  Tyler Cymet,et al.  The era of big data. , 2014, Maryland medicine : MM : a publication of MEDCHI, the Maryland State Medical Society.

[144]  B. Efron Bayes' Theorem in the 21st Century , 2013, Science.

[145]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[146]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[147]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[148]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[149]  Alexander J. Smola,et al.  Reducing the sampling complexity of topic models , 2014, KDD.

[150]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[151]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[152]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[153]  Matthew J. Johnson,et al.  Analyzing Hogwild Parallel Gaussian Gibbs Sampling , 2013, NIPS.

[154]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[155]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[156]  N. Lazar,et al.  Methods and Criteria for Model Selection , 2004 .

[157]  Christian P. Robert,et al.  Accelerating Metropolis-Hastings algorithms: Delayed acceptance with prefetching , 2014, 1406.2660.

[158]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[159]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[160]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[161]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[162]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[163]  Ruslan Salakhutdinov,et al.  Learning Deep Generative Models , 2009 .

[164]  Ning Chen,et al.  Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[165]  Alexander J. Smola,et al.  Scalable inference in latent variable models , 2012, WSDM '12.

[166]  S. Lauritzen Propagation of Probabilities, Means, and Variances in Mixed Graphical Association Models , 1992 .

[167]  Gareth O. Roberts,et al.  A General Framework for the Parametrization of Hierarchical Models , 2007, 0708.3797.

[168]  Jinyang Li,et al.  Building fast, distributed programs with partitioned tables , 2010 .

[169]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[170]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[171]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[172]  Yaming Yu,et al.  To Center or Not to Center: That Is Not the Question—An Ancillarity–Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency , 2011 .

[173]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[174]  Peter I. Frazier,et al.  Distance Dependent Infinite Latent Feature Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[175]  Yang Song,et al.  Stochastic Gradient Geodesic MCMC Methods , 2016, NIPS.

[176]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[177]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[178]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[179]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[180]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[181]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[182]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[183]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[184]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[185]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[186]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[187]  J. Tawn,et al.  Efficient inference for spatial extreme value processes associated to log-Gaussian random functions , 2014 .

[188]  Zhiyuan Liu,et al.  PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing , 2011, TIST.

[189]  Jordan L. Boyd-Graber,et al.  Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce , 2012, WWW.

[190]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[191]  Ajay Jasra,et al.  On population-based simulation for static inference , 2007, Stat. Comput..

[192]  Jun Zhu,et al.  Online Bayesian Passive-Aggressive Learning , 2013, ICML.

[193]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[194]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[195]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[196]  Wayne Luk,et al.  Accelerating sequential Monte Carlo method for real-time air traffic management , 2014, CARN.

[197]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[198]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[199]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[200]  Yang Song,et al.  Kernel Bayesian Inference with Posterior Regularization , 2016, NIPS.

[201]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[202]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[203]  Michael I. Jordan,et al.  Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models , 2012, NIPS.

[204]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[205]  Michael I. Jordan,et al.  MAD-Bayes: MAP-based Asymptotic Derivations from Bayes , 2012, ICML.

[206]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[207]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[208]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[209]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[210]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.

[211]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[212]  Brahim Chaib-draa,et al.  Learning the Structure of Probabilistic Graphical Models with an Extended Cascading Indian Buffet Process , 2014, AAAI.

[213]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[214]  Jun Zhu,et al.  Small-Variance Asymptotics for Dirichlet Process Mixtures of SVMs , 2014, AAAI.

[215]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[216]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[217]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[218]  A. Steele Predictability , 1997, The British journal of ophthalmology.

[219]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[220]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[221]  W. S. Rhode,et al.  A composite model of the auditory periphery for the processing of speech based on the filter response functions of single auditory-nerve fibers. , 1991, The Journal of the Acoustical Society of America.

[222]  G. Brumfiel High-energy physics: Down the petabyte highway , 2011, Nature.

[223]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[224]  Peter I. Frazier,et al.  Distance dependent Chinese restaurant processes , 2009, ICML.

[225]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[226]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[227]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[228]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.