Stochastic Gradient Methods for Principled Estimation with Large Data Sets
暂无分享,去创建一个
[1] O. Cappé,et al. On‐line expectation–maximization algorithm for latent data models , 2009 .
[2] M. Girolami,et al. Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
[3] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[4] Steven J. Nowlan,et al. Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .
[5] P. Toulis,et al. Implicit stochastic gradient descent , 2014 .
[6] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[7] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.
[8] Dimitri P. Bertsekas,et al. Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems , 2014, Math. Oper. Res..
[9] C. G. Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .
[10] Edoardo M. Airoldi,et al. Towards Stability and Optimality in Stochastic Gradient Descent , 2015, AISTATS.
[11] Kenji Fukumizu,et al. Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.
[12] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[13] P. Lions,et al. Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .
[14] J. Blum. Multidimensional Stochastic Approximation Methods , 1954 .
[15] J. Sacks. Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[18] Wei Xu,et al. Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.
[19] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[20] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[21] J. Nagumo,et al. A learning method for system identification , 1967, IEEE Transactions on Automatic Control.
[22] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[23] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[24] P. Green. Iteratively reweighted least squares for maximum likelihood estimation , 1984 .
[25] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[26] D. Ruppert. A NEW DYNAMIC STOCHASTIC APPROXIMATION PROCEDURE , 1979 .
[27] Donald Geman,et al. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .
[28] Noureddine El Karoui. Spectrum estimation for large dimensional covariance matrices using random matrix theory , 2006, math/0609418.
[29] Shin Ishii,et al. On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.
[30] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[31] B. Schölkopf,et al. Modeling Human Motion Using Binary Latent Variables , 2007 .
[32] L. Rosasco,et al. Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.
[33] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.
[34] Babak Hassibi,et al. The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.
[35] Andrzej Cichocki,et al. Stability Analysis of Learning Algorithms for Blind Source Separation , 1997, Neural Networks.
[36] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[37] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[38] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[39] Dirk T. M. Slock,et al. On the convergence behavior of the LMS and the normalized LMS algorithms , 1993, IEEE Trans. Signal Process..
[40] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[41] D. Sakrison. Efficient recursive estimation; application to estimating the parameters of a covariance function , 1965 .
[42] Martin Kiefel,et al. Quasi-Newton Methods: A New Direction , 2012, ICML.
[43] R. Has’minskiĭ,et al. Stochastic Approximation and Recursive Estimation , 1976 .
[44] Max Welling,et al. Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.
[45] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[46] K. Lange. A gradient algorithm locally equivalent to the EM algorithm , 1995 .
[47] V. Fabian. Asymptotically Efficient Stochastic Approximation; The RM Case , 1973 .
[48] Bernard Widrow,et al. Adaptive switching circuits , 1988 .
[49] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[50] E. Airoldi,et al. Stochastic gradient descent methods for estimation with large data sets , 2015, 1509.06459.
[51] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[52] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[53] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[54] W. Gardner. Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique , 1984 .
[55] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.
[56] R. Fisher,et al. On the Mathematical Foundations of Theoretical Statistics , 1922 .
[57] H. Robbins. A Stochastic Approximation Method , 1951 .
[58] Peter L. Bartlett,et al. Implicit Online Learning , 2010, ICML.
[59] Xi Chen,et al. Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.
[60] J. H. Venter. An extension of the Robbins-Monro procedure , 1967 .
[61] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[62] D. Titterington. Recursive Parameter Estimation Using Incomplete Data , 1984 .
[63] Edoardo M. Airoldi,et al. Statistical analysis of stochastic gradient methods for generalized linear models , 2014, ICML.
[64] G. Pflug,et al. Stochastic approximation and optimization of random systems , 1992 .
[65] Dimitri P. Bertsekas,et al. Incremental proximal methods for large scale convex optimization , 2011, Math. Program..
[66] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[67] Yoshua Bengio,et al. Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.
[68] Rzysztof,et al. A Geometric View of Non-Linear On-Line Stochastic Gradient Descent , 2007 .
[69] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[70] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.
[71] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.
[72] M. Kendall. Statistical Methods for Research Workers , 1937, Nature.
[73] Abhijit Gosavi,et al. Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..
[74] Jalal Almhana,et al. Online EM algorithm for mixture with application to internet traffic modeling , 2004 .
[75] Edoardo M. Airoldi,et al. Scalable estimation strategies based on stochastic approximations: classical results and new insights , 2015, Statistics and Computing.
[76] N. Pillai,et al. Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.
[77] Edoardo M. Airoldi,et al. Stability and optimality in stochastic gradient descent , 2015, ArXiv.
[78] H. Robbins,et al. Adaptive Design and Stochastic Approximation , 1979 .
[79] Edoardo M. Airoldi,et al. Implicit Temporal Differences , 2014, ArXiv.
[80] J. Spall. Adaptive stochastic approximation by the simultaneous perturbation method , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[81] V. Fabian. On Asymptotic Normality in Stochastic Approximation , 1968 .
[82] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[83] Olivier Capp'e. Online EM Algorithm for Hidden Markov Models , 2009, 0908.2359.
[84] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[85] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[86] Léon Bottou,et al. On-line learning for very large data sets , 2005 .
[87] P. Dupuis,et al. On sampling controlled stochastic approximation , 1991 .
[88] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[89] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[90] C. Z. Wei. Multivariate Adaptive Stochastic Approximation , 1987 .
[91] Hiroshi Nakagawa,et al. Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.
[92] L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .