暂无分享,去创建一个
[1] Yu-Xiang Wang,et al. Towards Instance-Optimal Offline Reinforcement Learning with Pessimism , 2021, NeurIPS.
[2] D. Freedman. On Tail Probabilities for Martingales , 1975 .
[3] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[4] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[5] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[6] Jianqing Fan,et al. Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model , 2021, ArXiv.
[7] H. Robbins. A Stochastic Approximation Method , 1951 .
[8] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[9] Myung Hwan Seo,et al. Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling , 2021, ArXiv.
[10] Martin J. Wainwright,et al. Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning , 2021, ArXiv.
[11] P. Hall,et al. Martingale Limit Theory and Its Application , 1980 .
[12] T. Moore. A Theory of Cramer-Rao Bounds for Constrained Parametric Models , 2010 .
[13] Moritz Jirak,et al. On Weak Invariance Principles for Partial Sums , 2017 .
[14] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[15] W. Newey,et al. Semiparametric Efficiency Bounds , 1990 .
[16] Yuancheng Zhu,et al. Uncertainty Quantification for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent , 2018, 1802.04876.
[17] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[18] Martin J. Wainwright,et al. Optimal oracle inequalities for solving projected fixed-point equations , 2020, ArXiv.
[19] Siva Theja Maguluri,et al. Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes , 2020, ArXiv.
[20] Niao He,et al. A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms , 2019, ArXiv.
[21] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[22] Zhihua Zhang,et al. Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics , 2021 .
[23] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[24] Yu-Xiang Wang,et al. Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning , 2020, AISTATS.
[25] Martin J. Wainwright,et al. Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.
[26] Michael R. Kosorok,et al. Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning , 2016, Journal of the American Statistical Association.
[27] Adam Wierman,et al. Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.
[28] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[29] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[30] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[31] Thinh T. Doan,et al. Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning , 2019, Autom..
[32] Devavrat Shah,et al. Q-learning with Nearest Neighbors , 2018, NeurIPS.
[33] Xi Chen,et al. Online Covariance Matrix Estimation in Stochastic Gradient Descent , 2020, Journal of the American Statistical Association.
[34] Csaba Szepesv'ari,et al. Bootstrapping Statistical Inference for Off-Policy Evaluation , 2021, ArXiv.
[35] Yuantao Gu,et al. Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction , 2022, IEEE Transactions on Information Theory.
[36] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[37] Karthikeyan Shanmugam,et al. A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants , 2021, ArXiv.
[38] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[39] S. Zhang,et al. Statistical inference of the value function for reinforcement learning in infinite‐horizon settings , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
[40] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[41] Martin J. Wainwright,et al. On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration , 2020, COLT.
[42] Mark W. Schmidt,et al. Variance-Reduced Methods for Machine Learning , 2020, Proceedings of the IEEE.
[43] Paolo Paruolo,et al. Simple Robust Testing of Regression Hypotheses: A Comment , 2001 .
[44] R. Srikant,et al. Error bounds for constant step-size Q-learning , 2012, Syst. Control. Lett..
[45] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.
[46] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[47] Guanghui Lan,et al. Accelerated and instance-optimal policy evaluation with linear function approximation , 2021, ArXiv.
[48] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[49] Xin T. Tong,et al. Statistical inference for model parameters in stochastic gradient descent , 2016, The Annals of Statistics.
[50] Donghwan Lee,et al. Target-Based Temporal Difference Learning , 2019, ICML.
[51] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[52] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[53] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[54] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[55] H. Alzer. Inequalities for the gamma function , 1999 .
[56] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[57] Martin J. Wainwright,et al. Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning , 2021, IEEE Transactions on Information Theory.
[58] Changxiao Cai,et al. Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis , 2021 .
[59] Jean Jacod,et al. Skorokhod Topology and Convergence of Processes , 2003 .
[60] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[61] Martin J. Wainwright,et al. Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis , 2020, SIAM J. Math. Data Sci..
[62] Xiangyu Chang,et al. Statistical Estimation and Inference via Local SGD in Federated Learning , 2021, ArXiv.
[63] Yuantao Gu,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[64] A. Tsiatis. Semiparametric Theory and Missing Data , 2006 .
[65] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[66] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[67] Xiangyang Ji,et al. Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity , 2020, ICML.