Stochastic Gradient Methods for Principled Estimation with Large Data Sets

14.

[1]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[2]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[3]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[4]  Steven J. Nowlan,et al.  Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[5]  P. Toulis,et al.  Implicit stochastic gradient descent , 2014 .

[6]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[7]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[8]  Dimitri P. Bertsekas,et al.  Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems , 2014, Math. Oper. Res..

[9]  C. G. Broyden A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[10]  Edoardo M. Airoldi,et al.  Towards Stability and Optimality in Stochastic Gradient Descent , 2015, AISTATS.

[11]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[12]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[13]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[14]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[15]  J. Sacks Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[16]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[17]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[18]  Wei Xu,et al.  Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.

[19]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.

[20]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[21]  J. Nagumo,et al.  A learning method for system identification , 1967, IEEE Transactions on Automatic Control.

[22]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[23]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[24]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[25]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[26]  D. Ruppert A NEW DYNAMIC STOCHASTIC APPROXIMATION PROCEDURE , 1979 .

[27]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[28]  Noureddine El Karoui Spectrum estimation for large dimensional covariance matrices using random matrix theory , 2006, math/0609418.

[29]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[30]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[31]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[32]  L. Rosasco,et al.  Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.

[33]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[34]  Babak Hassibi,et al.  The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.

[35]  Andrzej Cichocki,et al.  Stability Analysis of Learning Algorithms for Blind Source Separation , 1997, Neural Networks.

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  Warren B. Powell,et al.  Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.

[38]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[39]  Dirk T. M. Slock,et al.  On the convergence behavior of the LMS and the normalized LMS algorithms , 1993, IEEE Trans. Signal Process..

[40]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[41]  D. Sakrison Efficient recursive estimation; application to estimating the parameters of a covariance function , 1965 .

[42]  Martin Kiefel,et al.  Quasi-Newton Methods: A New Direction , 2012, ICML.

[43]  R. Has’minskiĭ,et al.  Stochastic Approximation and Recursive Estimation , 1976 .

[44]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[45]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[46]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[47]  V. Fabian Asymptotically Efficient Stochastic Approximation; The RM Case , 1973 .

[48]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[49]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[50]  E. Airoldi,et al.  Stochastic gradient descent methods for estimation with large data sets , 2015, 1509.06459.

[51]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[52]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[53]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[54]  W. Gardner Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique , 1984 .

[55]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[56]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[57]  H. Robbins A Stochastic Approximation Method , 1951 .

[58]  Peter L. Bartlett,et al.  Implicit Online Learning , 2010, ICML.

[59]  Xi Chen,et al.  Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.

[60]  J. H. Venter An extension of the Robbins-Monro procedure , 1967 .

[61]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[62]  D. Titterington Recursive Parameter Estimation Using Incomplete Data , 1984 .

[63]  Edoardo M. Airoldi,et al.  Statistical analysis of stochastic gradient methods for generalized linear models , 2014, ICML.

[64]  G. Pflug,et al.  Stochastic approximation and optimization of random systems , 1992 .

[65]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[66]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[67]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[68]  Rzysztof,et al.  A Geometric View of Non-Linear On-Line Stochastic Gradient Descent , 2007 .

[69]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[70]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[71]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[72]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[73]  Abhijit Gosavi,et al.  Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[74]  Jalal Almhana,et al.  Online EM algorithm for mixture with application to internet traffic modeling , 2004 .

[75]  Edoardo M. Airoldi,et al.  Scalable estimation strategies based on stochastic approximations: classical results and new insights , 2015, Statistics and Computing.

[76]  N. Pillai,et al.  Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[77]  Edoardo M. Airoldi,et al.  Stability and optimality in stochastic gradient descent , 2015, ArXiv.

[78]  H. Robbins,et al.  Adaptive Design and Stochastic Approximation , 1979 .

[79]  Edoardo M. Airoldi,et al.  Implicit Temporal Differences , 2014, ArXiv.

[80]  J. Spall Adaptive stochastic approximation by the simultaneous perturbation method , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[81]  V. Fabian On Asymptotic Normality in Stochastic Approximation , 1968 .

[82]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[83]  Olivier Capp'e Online EM Algorithm for Hidden Markov Models , 2009, 0908.2359.

[84]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[85]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[86]  Léon Bottou,et al.  On-line learning for very large data sets , 2005 .

[87]  P. Dupuis,et al.  On sampling controlled stochastic approximation , 1991 .

[88]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[89]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[90]  C. Z. Wei Multivariate Adaptive Stochastic Approximation , 1987 .

[91]  Hiroshi Nakagawa,et al.  Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.

[92]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .