A Game Theoretic Framework for Model Based Reinforcement Learning
暂无分享,去创建一个
[1] K. Narendra,et al. Persistent excitation in adaptive systems , 1987 .
[2] Lennart Ljung,et al. System Identification: Theory for the User , 1987 .
[3] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] J. Doyle,et al. Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.
[6] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[7] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[8] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[9] Harold R. Parks,et al. The Implicit Function Theorem , 2002 .
[10] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[11] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[12] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[13] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[14] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[15] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[16] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[19] Koby Crammer,et al. Analysis of Representations for Domain Adaptation , 2006, NIPS.
[20] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[21] Patrice Marcotte,et al. An overview of bilevel optimization , 2007, Ann. Oper. Res..
[22] Richard M. Murray,et al. Feedback Systems: An Introduction for Scientists and Engineers , 2008 .
[23] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[24] Heinrich von Stackelberg. Market Structure and Equilibrium , 2010 .
[25] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[26] H. Brendan McMahan,et al. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.
[27] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[28] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[29] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[30] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[31] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[32] Emanuel Todorov,et al. Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.
[33] Mi-Ching Tsai,et al. Robust and Optimal Control , 2014 .
[34] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[35] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[37] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[38] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.
[39] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[40] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[41] Sergey Levine,et al. Optimal control with learned local models: Application to dexterous manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[42] Kate Saenko,et al. Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.
[43] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[44] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[45] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[46] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[47] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[48] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] David Pfau,et al. Unrolled Generative Adversarial Networks , 2016, ICLR.
[50] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[51] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[52] Balaraman Ravindran,et al. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.
[53] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[54] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[55] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[56] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[57] Anca D. Dragan,et al. Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state , 2018, Auton. Robots.
[58] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[59] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[60] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[61] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[62] Byron Boots,et al. Dual Policy Iteration , 2018, NeurIPS.
[63] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[64] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .
[65] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.
[66] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[67] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[68] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[69] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[70] Shalabh Bhatnagar,et al. Two Timescale Stochastic Approximation with Controlled Markov noise , 2015, Math. Oper. Res..
[71] S. Levine,et al. ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots , 2019, CoRL.
[72] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.
[73] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[74] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.
[75] Sergey Levine,et al. Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.
[76] Lin F. Yang,et al. On the Optimality of Sparse Model-Based Planning for Markov Decision Processes , 2019, ArXiv.
[77] Lillian J. Ratliff,et al. Convergence of Learning Dynamics in Stackelberg Games , 2019, ArXiv.
[78] Deepak Pathak,et al. Self-Supervised Exploration via Disagreement , 2019, ICML.
[79] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[80] Sergey Levine,et al. Meta-Learning with Implicit Gradients , 2019, NeurIPS.
[81] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[82] Florian Schäfer,et al. Competitive Gradient Descent , 2019, NeurIPS.
[83] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[84] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[85] Bo Dai,et al. Reinforcement Learning via Fenchel-Rockafellar Duality , 2020, ArXiv.
[86] Jimmy Ba,et al. On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach , 2019, ICLR.
[87] Michael I. Jordan,et al. What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.