Scalable estimation strategies based on stochastic approximations: classical results and new insights
暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems , 2014, Math. Oper. Res..
[2] O. Cappé,et al. On‐line expectation–maximization algorithm for latent data models , 2009 .
[3] C. G. Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .
[4] R. Douglas Martin,et al. Robust estimation via stochastic approximation , 1975, IEEE Trans. Inf. Theory.
[5] M. Girolami,et al. Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
[6] Jalal Almhana,et al. Online EM algorithm for mixture with application to internet traffic modeling , 2004 .
[7] Léon Bottou,et al. On-line learning for very large data sets: Research Articles , 2005 .
[8] J. Nagumo,et al. A learning method for system identification , 1967, IEEE Transactions on Automatic Control.
[9] N. Pillai,et al. Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.
[10] K. Chung. On a Stochastic Approximation Method , 1954 .
[11] H. Robbins,et al. Adaptive Design and Stochastic Approximation , 1979 .
[12] Edoardo M. Airoldi,et al. Implicit Temporal Differences , 2014, ArXiv.
[13] V. Fabian. On Asymptotic Normality in Stochastic Approximation , 1968 .
[14] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[15] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.
[16] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.
[17] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[18] P. Green. Iteratively reweighted least squares for maximum likelihood estimation , 1984 .
[19] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[20] Léon Bottou,et al. On-line learning for very large data sets , 2005 .
[21] Olivier Capp'e. Online EM Algorithm for Hidden Markov Models , 2009, 0908.2359.
[22] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[23] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[24] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[25] Donald Geman,et al. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .
[26] Shin Ishii,et al. On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.
[27] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[28] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[29] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.
[30] P. Dupuis,et al. On sampling controlled stochastic approximation , 1991 .
[31] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[32] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[33] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[34] Han-Fu Chen,et al. Asymptotically efficient stochastic approximation , 1993 .
[35] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[36] Noureddine El Karoui. Spectrum estimation for large dimensional covariance matrices using random matrix theory , 2006, math/0609418.
[37] R. Fisher,et al. On the Mathematical Foundations of Theoretical Statistics , 1922 .
[38] Mikhail Borisovich Nevelʹson,et al. Stochastic Approximation and Recursive Estimation , 1976 .
[39] Wei Xu,et al. Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.
[40] Manfred K. Warmuth,et al. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms , 2005, Machine Learning.
[41] Kenneth Lange,et al. Numerical analysis for statisticians , 1999 .
[42] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[43] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[44] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .
[45] D. Titterington. Recursive Parameter Estimation Using Incomplete Data , 1984 .
[46] Rory A. Fisher,et al. Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.
[47] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.
[48] Edoardo M. Airoldi,et al. Statistical analysis of stochastic gradient methods for generalized linear models , 2014, ICML.
[49] G. Pflug,et al. Stochastic approximation and optimization of random systems , 1992 .
[50] V. Fabian. Asymptotically Efficient Stochastic Approximation; The RM Case , 1973 .
[51] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.
[52] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[53] Yoshua Bengio,et al. Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.
[54] Peter L. Bartlett,et al. Implicit Online Learning , 2010, ICML.
[55] B. Ripley,et al. Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.
[56] Babak Hassibi,et al. The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.
[57] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[58] Kenji Fukumizu,et al. Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.
[59] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[60] J. Sacks. Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .
[61] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[62] C. Z. Wei. Multivariate Adaptive Stochastic Approximation , 1987 .
[63] Dale Schuurmans,et al. implicit Online Learning with Kernels , 2006, NIPS.
[64] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[65] Hiroshi Nakagawa,et al. Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.
[66] L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .
[67] M. Kendall. Statistical Methods for Research Workers , 1937, Nature.
[68] Abhijit Gosavi,et al. Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..
[69] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[70] Manfred K. Warmuth,et al. On the worst-case analysis of temporal-difference learning algorithms , 2004, Machine Learning.
[71] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[72] Noboru Murata,et al. A Statistical Study on On-line Learning , 1999 .
[73] Xi Chen,et al. Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.
[74] Andrew Gelman,et al. Handbook of Markov Chain Monte Carlo , 2011 .
[75] Lihong Li,et al. A worst-case comparison between temporal difference and residual gradient with linear function approximation , 2008, ICML '08.
[76] Martin Kiefel,et al. Quasi-Newton Methods: A New Direction , 2012, ICML.
[77] Max Welling,et al. Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.
[78] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[79] K. Lange. A gradient algorithm locally equivalent to the EM algorithm , 1995 .
[80] B. Schölkopf,et al. Modeling Human Motion Using Binary Latent Variables , 2007 .
[81] L. Rosasco,et al. Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.
[82] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[83] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[84] Dirk T. M. Slock,et al. On the convergence behavior of the LMS and the normalized LMS algorithms , 1993, IEEE Trans. Signal Process..
[85] D. Sakrison. Efficient recursive estimation; application to estimating the parameters of a covariance function , 1965 .
[86] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[87] R. Has’minskiĭ,et al. Stochastic Approximation and Recursive Estimation , 1976 .
[88] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[89] Steven J. Nowlan,et al. Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .
[90] Jeffrey S. Rosenthal,et al. Optimal Proposal Distributions and Adaptive MCMC , 2011 .
[91] J. H. Venter. An extension of the Robbins-Monro procedure , 1967 .
[92] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[93] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .