暂无分享,去创建一个
[1] D. V. Lindley,et al. An Introduction to Probability Theory and Its Applications. Volume II , 1967, The Mathematical Gazette.
[2] V. B. Tadic,et al. On the almost sure rate of convergence of linear stochastic approximation algorithms , 2004, IEEE Transactions on Information Theory.
[3] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[4] Matthieu Lerasle,et al. ROBUST MACHINE LEARNING BY MEDIAN-OF-MEANS: THEORY AND PRACTICE , 2019 .
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] T. Cai,et al. An adaptation theory for nonparametric confidence intervals , 2004, math/0503662.
[7] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[8] Shie Mannor,et al. How hard is my MDP?" The distribution-norm to the rescue" , 2014, NIPS.
[9] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[10] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[11] Martin J. Wainwright,et al. The Local Geometry of Testing in Ellipses: Tight Control via Localized Kolmogorov Widths , 2017, IEEE Transactions on Information Theory.
[12] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[13] Frank E. Grubbs,et al. An Introduction to Probability Theory and Its Applications , 1951 .
[14] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[15] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[16] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning , 2019, 1905.06265.
[17] Shie Mannor,et al. Reinforcement learning in the presence of rare events , 2008, ICML '08.
[18] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[19] Kiyosi Itô,et al. Essentials of Stochastic Processes , 2006 .
[20] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[21] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[22] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[23] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[24] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[25] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[26] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[27] Jonathan P. How,et al. Improving PAC Exploration Using the Median Of Means , 2016, NIPS.
[28] John D. Lafferty,et al. Local Minimax Complexity of Stochastic Convex Optimization , 2016, NIPS.
[29] E. Mammen. The Bootstrap and Edgeworth Expansion , 1997 .
[30] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Nan Jiang,et al. Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon , 2018, COLT.
[33] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[34] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[35] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[36] C.C. White,et al. Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.
[37] Andrew Vince. A rearrangement inequality and the permutahedron , 1990 .
[38] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[39] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[40] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.
[41] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[42] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .
[43] Yuxin Chen,et al. Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..
[44] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[45] Yanjun Han,et al. Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.
[46] Lin F. Yang,et al. On the Optimality of Sparse Model-Based Planning for Markov Decision Processes , 2019, ArXiv.
[47] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[48] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[49] L. Sherry,et al. Estimating Taxi-out times with a reinforcement learning algorithm , 2008, 2008 IEEE/AIAA 27th Digital Avionics Systems Conference.
[50] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[51] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[52] Leslie G. Valiant,et al. Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..
[53] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .