论文信息 - Stochastic Gradient Methods for Principled Estimation with Large Data Sets - 字舞流文

Stochastic Gradient Methods for Principled Estimation with Large Data Sets

14.

Edoardo M. Airoldi | Panos Toulis | E. Airoldi | Panos Toulis | Edo Airoldi

[1] O. Cappé,et al. On‐line expectation–maximization algorithm for latent data models , 2009 .

[2] M. Girolami,et al. Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[3] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[4] Steven J. Nowlan,et al. Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[5] P. Toulis,et al. Implicit stochastic gradient descent , 2014 .

[6] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[7] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[8] Dimitri P. Bertsekas,et al. Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems , 2014, Math. Oper. Res..

[9] C. G. Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[10] Edoardo M. Airoldi,et al. Towards Stability and Optimality in Stochastic Gradient Descent , 2015, AISTATS.

[11] Kenji Fukumizu,et al. Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[12] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[13] P. Lions,et al. Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[14] J. Blum. Multidimensional Stochastic Approximation Methods , 1954 .

[15] J. Sacks. Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[17] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.

[18] Wei Xu,et al. Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.

[19] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.

[20] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..

[21] J. Nagumo,et al. A learning method for system identification , 1967, IEEE Transactions on Automatic Control.

[22] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[23] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[24] P. Green. Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[25] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[26] D. Ruppert. A NEW DYNAMIC STOCHASTIC APPROXIMATION PROCEDURE , 1979 .

[27] Donald Geman,et al. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[28] Noureddine El Karoui. Spectrum estimation for large dimensional covariance matrices using random matrix theory , 2006, math/0609418.

[29] Shin Ishii,et al. On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[30] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[31] B. Schölkopf,et al. Modeling Human Motion Using Binary Latent Variables , 2007 .

[32] L. Rosasco,et al. Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.

[33] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[34] Babak Hassibi,et al. The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.

[35] Andrzej Cichocki,et al. Stability Analysis of Learning Algorithms for Blind Source Separation , 1997, Neural Networks.

[36] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.

[38] E. L. Lehmann,et al. Theory of point estimation , 1950 .

[39] Dirk T. M. Slock,et al. On the convergence behavior of the LMS and the normalized LMS algorithms , 1993, IEEE Trans. Signal Process..

[40] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[41] D. Sakrison. Efficient recursive estimation; application to estimating the parameters of a covariance function , 1965 .

[42] Martin Kiefel,et al. Quasi-Newton Methods: A New Direction , 2012, ICML.

[43] R. Has’minskiĭ,et al. Stochastic Approximation and Recursive Estimation , 1976 .

[44] Max Welling,et al. Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[45] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .

[46] K. Lange. A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[47] V. Fabian. Asymptotically Efficient Stochastic Approximation; The RM Case , 1973 .

[48] Bernard Widrow,et al. Adaptive switching circuits , 1988 .

[49] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[50] E. Airoldi,et al. Stochastic gradient descent methods for estimation with large data sets , 2015, 1509.06459.

[51] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[52] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[53] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[54] W. Gardner. Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique , 1984 .

[55] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.

[56] R. Fisher,et al. On the Mathematical Foundations of Theoretical Statistics , 1922 .

[57] H. Robbins. A Stochastic Approximation Method , 1951 .

[58] Peter L. Bartlett,et al. Implicit Online Learning , 2010, ICML.

[59] Xi Chen,et al. Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.

[60] J. H. Venter. An extension of the Robbins-Monro procedure , 1967 .

[61] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[62] D. Titterington. Recursive Parameter Estimation Using Incomplete Data , 1984 .

[63] Edoardo M. Airoldi,et al. Statistical analysis of stochastic gradient methods for generalized linear models , 2014, ICML.

[64] G. Pflug,et al. Stochastic approximation and optimization of random systems , 1992 .

[65] Dimitri P. Bertsekas,et al. Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[66] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[67] Yoshua Bengio,et al. Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[68] Rzysztof,et al. A Geometric View of Non-Linear On-Line Stochastic Gradient Descent , 2007 .

[69] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[70] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[71] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[72] M. Kendall. Statistical Methods for Research Workers , 1937, Nature.

[73] Abhijit Gosavi,et al. Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[74] Jalal Almhana,et al. Online EM algorithm for mixture with application to internet traffic modeling , 2004 .

[75] Edoardo M. Airoldi,et al. Scalable estimation strategies based on stochastic approximations: classical results and new insights , 2015, Statistics and Computing.

[76] N. Pillai,et al. Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[77] Edoardo M. Airoldi,et al. Stability and optimality in stochastic gradient descent , 2015, ArXiv.

[78] H. Robbins,et al. Adaptive Design and Stochastic Approximation , 1979 .

[79] Edoardo M. Airoldi,et al. Implicit Temporal Differences , 2014, ArXiv.

[80] J. Spall. Adaptive stochastic approximation by the simultaneous perturbation method , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[81] V. Fabian. On Asymptotic Normality in Stochastic Approximation , 1968 .

[82] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[83] Olivier Capp'e. Online EM Algorithm for Hidden Markov Models , 2009, 0908.2359.

[84] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[85] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[86] Léon Bottou,et al. On-line learning for very large data sets , 2005 .

[87] P. Dupuis,et al. On sampling controlled stochastic approximation , 1991 .

[88] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[89] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[90] C. Z. Wei. Multivariate Adaptive Stochastic Approximation , 1987 .

[91] Hiroshi Nakagawa,et al. Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.

[92] L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .