论文信息 - Big Learning with Bayesian Methods

Big Learning with Bayesian Methods

Explosive growth in data and availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems, and applications with Big Data. Bayesian methods represent one important class of statistic methods for machine learning, with substantial recent developments on adaptive, flexible and scalable Bayesian learning. This article provides a survey of the recent advances in Big learning with Bayesian methods, termed Big Bayesian Learning, including nonparametric Bayesian methods for adaptively inferring model complexity, regularized Bayesian inference for improving the flexibility via posterior regularization, and scalable algorithms and systems based on stochastic subsampling and distributed computing for dealing with large-scale applications.

[1] Bo Zhang,et al. Improved Bayesian Logistic Supervised Topic Models with Data Augmentation , 2013, ACL.

[2] D. Blei. Bayesian Nonparametrics I , 2016 .

[3] Naftali Tishby,et al. Predictability, Complexity, and Learning , 2000, Neural Computation.

[4] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.

[5] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[6] A. Alexandrova. The British Journal for the Philosophy of Science , 1965, Nature.

[7] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[8] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[9] Fernando A. Quintana,et al. Nonparametric Bayesian data analysis , 2004 .

[10] W. R. Schucany,et al. Handbook of Parallel Computing and Statistics , 2008 .

[11] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[12] Michael I. Jordan,et al. Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[13] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[14] Bo Zhang,et al. Scalable inference in max-margin topic models , 2013, KDD.

[15] Zoubin Ghahramani,et al. Bayesian non-parametrics and the probabilistic approach to modelling , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[16] Jun S. Liu,et al. Sequential Monte Carlo methods for dynamic systems , 1997 .

[17] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[18] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] P. Moral,et al. Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[20] Xiangyu Wang,et al. Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[21] Yee Whye Teh,et al. Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[22] John D. Lafferty,et al. Correlated Topic Models , 2005, NIPS.

[23] Christian P Robert,et al. Lack of confidence in approximate Bayesian computation model choice , 2011, Proceedings of the National Academy of Sciences.

[24] Cory Doctorow. Big data: Welcome to the petacentre , 2008, Nature.

[25] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[26] Ryan P. Adams,et al. Learning the Structure of Deep Sparse Graphical Models , 2009, AISTATS.

[27] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.

[28] G. Casella,et al. The Bayesian Lasso , 2008 .

[29] Ning Chen,et al. Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[30] Reynold Xin,et al. GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[31] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[32] Jianqing Fan,et al. A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[33] Thomas L. Griffiths,et al. Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[34] Eric P. Xing,et al. Model-Parallel Inference for Big Topic Models , 2014, ArXiv.

[35] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[36] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[37] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[38] Zoubin Ghahramani,et al. Dependent Indian Buffet Processes , 2010, AISTATS.

[39] Yang Gao,et al. Towards Topic Modeling for Big Data , 2014, ArXiv.

[40] Max Welling,et al. Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.

[41] Brandon M. Turner,et al. A tutorial on approximate Bayesian computation , 2012 .

[42] David B. Dunson,et al. Scalable and Robust Bayesian Inference via the Median Posterior , 2014, ICML.

[43] Yee Whye Teh,et al. Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex , 2013, NIPS.

[44] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[45] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[46] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[47] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[48] Ingvar Strid. Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach , 2010, Comput. Stat. Data Anal..

[49] Yee Whye Teh,et al. Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[50] Arnaud Doucet,et al. Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[51] C. Antoniak. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[52] Chong Wang,et al. An Adaptive Learning Rate for Stochastic Variational Inference , 2013, ICML.

[53] Zoubin Ghahramani,et al. Pitfalls in the use of Parallel Inference for the Dirichlet Process , 2014, ICML.

[54] Oluwasanmi Koyejo,et al. Constrained Bayesian Inference for Low Rank Multitask Learning , 2013, UAI.

[55] David B. Dunson,et al. Bayesian Conditional Density Filtering , 2014, Journal of Computational and Graphical Statistics.

[56] Ke Jiang,et al. Small-Variance Asymptotics for Hidden Markov Models , 2013, NIPS.

[57] Michael I. Jordan. Graphical Models , 1998 .

[58] Mohammad Emtiyaz Khan,et al. Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models , 2011, ICML.

[59] Radford M. Neal. Slice Sampling , 2003, The Annals of Statistics.

[60] P. M. Williams. Bayesian Conditionalisation and the Principle of Minimum Information , 1980, The British Journal for the Philosophy of Science.

[61] Dean Phillips Foster,et al. Calibration and Empirical Bayes Variable Selection , 1997 .

[62] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[63] Jun Zhu,et al. Robust RegBayes: Selectively Incorporating First-Order Logic Domain Knowledge into Bayesian Models , 2014, ICML.

[64] Jason A. Duan,et al. Generalized spatial dirichlet process models , 2007 .

[65] Max Welling,et al. Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[66] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[67] H. Jeffreys. An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[68] Carl E. Rasmussen,et al. Factorial Hidden Markov Models , 1997 .

[69] C. Geyer,et al. Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[70] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71] Yee Whye Teh,et al. Distributed Bayesian Posterior Sampling via Moment Sharing , 2014, NIPS.

[72] Michael I. Jordan,et al. Optimistic Concurrency Control for Distributed Unsupervised Learning , 2013, NIPS.

[73] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[74] Anthony Brockwell. Parallel Markov chain Monte Carlo Simulation by Pre-Fetching , 2006 .

[75] S. Walker. Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[76] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .

[77] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[78] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[79] Eric P. Xing,et al. Parallel Markov Chain Monte Carlo for Nonparametric Mixture Models , 2013, ICML.

[80] Feng Yan,et al. Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units , 2009, NIPS.

[81] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[82] Jon Doyle,et al. Fast Hamiltonian Monte Carlo Using GPU Computing , 2014, 1402.4089.

[83] Seunghak Lee,et al. Petuum: A Framework for Iterative-Convergent Distributed ML , 2013, ArXiv.

[84] Darren J. Wilkinson,et al. Parallel Bayesian Computation , 2005 .

[85] N. Pillai,et al. Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[86] Wenguang Chen,et al. WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation , 2015, Proc. VLDB Endow..

[87] Ning Chen,et al. Gibbs Max-Margin Topic Models with Fast Sampling Algorithms , 2013, ICML.

[88] Bo Zhang,et al. Scalable Inference for Logistic-Normal Topic Models , 2013, NIPS.

[89] Jun Zhu,et al. Distributing the Stochastic Gradient Sampler for Large-Scale LDA , 2016, KDD.

[90] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[91] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[92] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[93] Bo Zhang,et al. Max-Margin Infinite Hidden Markov Models , 2014, ICML.

[94] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[95] Eric P. Xing,et al. MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[96] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[97] John Canny,et al. BIDMach: Large-scale Learning with Zero Memory Allocation , 2013 .

[98] Ivor W. Tsang,et al. Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[99] Arnaud Doucet,et al. On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[100] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[101] Edward I. George,et al. Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[102] N. Narisetty,et al. Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[103] Michael I. Jordan,et al. Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[104] J. Sethuraman. A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[105] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106] Han Liu,et al. Challenges of Big Data Analysis. , 2013, National science review.

[107] Tim Kraska,et al. MLbase: A Distributed Machine-learning System , 2013, CIDR.

[108] Cliburn Chan,et al. Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[109] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[110] Joshua B. Tenenbaum,et al. Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[111] Stephen G. Walker,et al. Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[112] J. S. Rao,et al. Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[113] David M. Blei,et al. Efficient Online Inference for Bayesian Nonparametric Relational Models , 2013, NIPS.

[114] Manfred Opper,et al. A Bayesian Approach to Online Learning , 2006 .

[115] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[116] Judith Rousseau,et al. On Convergence Rates of Empirical Bayes Procedures , 2014 .

[117] Judith Rousseau,et al. Bayes and empirical Bayes : Do they merge? , 2012, 1204.1470.

[118] J. Pitman. Combinatorial Stochastic Processes , 2006 .

[119] Steven Reece,et al. Automated Machine Learning on Big Data using Stochastic Algorithm Tuning , 2014 .

[120] Xiao-Li Meng,et al. The Art of Data Augmentation , 2001 .