暂无分享,去创建一个
Yinlam Chow | Mohammad Ghavamzadeh | Brandon Cui | MoonKyung Ryu | M. Ghavamzadeh | Yinlam Chow | Brandon Cui | M. Ryu
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[2] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[3] Sergey Levine,et al. Unsupervised Exploration with Deep Model-Based Reinforcement Learning , 2018 .
[4] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[5] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[6] V. Kaul,et al. Planning , 2012 .
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Luc De Raedt,et al. Proceedings of the 22nd international conference on Machine learning , 2005 .
[9] Vivek S. Borkar,et al. Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..
[10] Martin J. Wainwright,et al. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.
[11] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[12] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[13] Ang Li,et al. Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control , 2020, ICLR.
[14] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[15] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[16] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[17] Amir-massoud Farahmand,et al. Iterative Value-Aware Model Learning , 2018, NeurIPS.
[18] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .
[19] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[20] Robert M Thrall,et al. Mathematics of Operations Research. , 1978 .
[21] Masashi Sugiyama,et al. Efficient Sample Reuse in EM-Based Policy Search , 2009, ECML/PKDD.
[22] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[23] Peter A. Flach,et al. Proceedings of the 28th International Conference on Machine Learning , 2011 .
[24] Roger Levin,et al. Consistency. , 2020, Journal of the American Dental Association.
[25] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[26] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.
[27] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[28] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.
[29] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[30] Matthew Fellows,et al. VIREL: A Variational Inference Framework for Reinforcement Learning , 2018, NeurIPS.
[31] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[33] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[34] Sergey Levine,et al. Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.
[35] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[36] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[37] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[38] M. V. Rossum,et al. In Neural Computation , 2022 .
[39] Ofir Nachum,et al. Path Consistency Learning in Tsallis Entropy Regularized MDPs , 2018, ICML.
[40] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[41] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[42] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[43] Mohammad Ghavamzadeh,et al. Policy-Aware Model Learning for Policy Gradient Methods , 2020, ArXiv.