A Kernel Loss for Solving the Bellman Equation
暂无分享,去创建一个
Qiang Liu | Yihao Feng | Lihong Li | Lihong Li | Qiang Liu | Yihao Feng
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .
[4] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[5] Arthur Gretton,et al. A Kernel Test of Goodness of Fit , 2016, ICML.
[6] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[7] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[8] Ing Rj Ser. Approximation Theorems of Mathematical Statistics , 1980 .
[9] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[10] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[11] J. Stewart. Positive definite functions and generalizations, an historical survey , 1976 .
[12] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[13] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.
[14] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[15] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[16] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[17] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Qiang Liu,et al. A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.
[20] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[21] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[22] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[23] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[24] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[25] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[26] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[27] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[28] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[29] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[30] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .
[31] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[32] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[33] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[34] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[35] E. Beutner,et al. Deriving the asymptotic distribution of U- and V-statistics of dependent data using weighted empirical processes , 2012, 1207.5899.
[36] Bernhard Schölkopf,et al. Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..
[37] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[38] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[39] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[40] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[41] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[42] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[43] Mengdi Wang,et al. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.
[44] Ali H. Sayed,et al. Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.
[45] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[46] Le Song,et al. Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.
[47] Xin Xu,et al. Kernel Least-Squares Temporal Difference Learning , 2006 .
[48] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[49] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..
[50] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[51] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[52] M. Denker,et al. On U-statistics and v. mise’ statistics for weakly dependent processes , 1983 .
[53] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..
[54] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[55] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[56] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[57] Le Song,et al. Boosting the Actor with Dual Critic , 2017, ICLR.