暂无分享,去创建一个
Alexander J. Smola | Kavosh Asadi | Michael L. Littman | Rasool Fakoor | Omer Gottesman | M. Littman | Alex Smola | Kavosh Asadi | Rasool Fakoor | Omer Gottesman
[1] James G. Scott,et al. Proximal Algorithms in Statistics and Machine Learning , 2015, ArXiv.
[2] Harm van Seijen,et al. Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning , 2019, NeurIPS.
[3] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[4] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[5] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[7] Alexander J. Smola,et al. P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.
[8] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[9] Dimitri P. Bertsekas,et al. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.
[10] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[11] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[12] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[13] Alexander J. Smola,et al. Meta-Q-Learning , 2020, ICLR.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[16] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[17] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[18] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[19] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[20] J. Moreau. Proximité et dualité dans un espace hilbertien , 1965 .
[21] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[22] Bo Liu,et al. Sparse Q-learning with Mirror Descent , 2012, UAI.
[23] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[24] Huan Li,et al. Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.
[25] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[26] P. Schrimpf,et al. Dynamic Programming , 2011 .
[27] Pratik Chaudhari,et al. Proximal Deterministic Policy Gradient , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[28] Doina Precup,et al. Exponentiated Gradient Methods for Reinforcement Learning , 1997, ICML.
[29] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[30] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[31] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[32] Alexander J. Smola,et al. Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.
[33] J. Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .
[34] M. Fukushima,et al. A generalized proximal point algorithm for certain non-convex minimization problems , 1981 .
[35] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[37] Mohammad Ghavamzadeh,et al. Mirror Descent Policy Optimization , 2020, ArXiv.
[38] Kavosh Asadi,et al. DeepMellow: Removing the Need for a Target Network in Deep Q-Learning , 2019, IJCAI.
[39] Alexander J. Smola,et al. Doubly Robust Covariate Shift Correction , 2015, AAAI.
[40] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[41] Dimitri P. Bertsekas,et al. Incremental proximal methods for large scale convex optimization , 2011, Math. Program..
[42] Kavosh Asadi,et al. Deep Radial-Basis Value Functions for Continuous Control , 2021, AAAI.
[43] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[44] Jian Li,et al. A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization , 2018, NeurIPS.
[45] Donghwan Lee,et al. Target-Based Temporal Difference Learning , 2019, ICML.
[46] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[47] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.
[48] Gergely Neu,et al. Logistic $Q$-Learning , 2020, AISTATS.
[49] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[50] Shimon Whiteson,et al. Breaking the Deadly Triad with a Target Network , 2021, ICML.
[51] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[52] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[53] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[54] Geoffrey Zweig,et al. Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.
[55] Hao He,et al. Trust Region-Guided Proximal Policy Optimization , 2019, NeurIPS.
[56] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[57] Heinz H. Bauschke,et al. Firmly Nonexpansive Mappings and Maximally Monotone Operators: Correspondence and Duality , 2011, 1101.4688.
[58] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[59] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.