Two-Timescale Networks for Nonlinear Value Function Approximation
暂无分享,去创建一个
Martha White | Somjit Nath | Wesley Chung | Ajin Joseph | Martha White | A. Joseph | Wesley Chung | Somjit Nath
[1] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[2] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[3] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[4] S. Bhatnagar,et al. Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem , 2015 .
[5] Rich Sutton,et al. A Deeper Look at Planning as Learning from Replay , 2015, ICML.
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[9] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[10] Tamer Basar,et al. Analysis of Recursive Stochastic Algorithms , 2001 .
[11] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[12] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[13] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[15] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[16] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[17] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[18] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.
[19] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.
[20] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[21] Lawrence Carin,et al. Linear Feature Encoding for Reinforcement Learning , 2016, NIPS.
[22] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..
[23] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[24] Martha White,et al. Effective sketching methods for value function approximation , 2017, UAI.
[25] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[26] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .
[27] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[28] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[29] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[30] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[31] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[32] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[33] Le Song,et al. Smoothed Dual Embedding Control , 2017, ArXiv.
[34] Shie Mannor,et al. Adaptive Bases for Reinforcement Learning , 2010, ECML/PKDD.
[35] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[36] Martha White,et al. Investigating Practical Linear Temporal Difference Learning , 2016, AAMAS.
[37] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[38] Han-Fu Chen. Stochastic approximation and its applications , 2002 .
[39] Vivek S. Borkar,et al. Actor-Critic Algorithms with Online Feature Adaptation , 2016, ACM Trans. Model. Comput. Simul..
[40] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[41] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[42] Vivek S. Borkar,et al. Feature Search in the Grassmanian in Online Reinforcement Learning , 2013, IEEE Journal of Selected Topics in Signal Processing.
[43] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[44] V. Borkar. Stochastic approximation with two time scales , 1997 .
[45] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[46] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.
[47] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[48] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.