Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs
暂无分享,去创建一个
[1] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[2] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[3] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[4] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[5] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[8] Florentin Wörgötter,et al. Fast biped walking with a reflexive controller and real-time policy searching , 2005, NIPS.
[9] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.
[10] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[11] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[12] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[13] Risto Miikkulainen,et al. HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.
[14] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[15] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[16] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[17] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[18] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[19] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[20] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[21] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[22] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[23] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[24] Noah J. Cowan,et al. Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.
[25] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[26] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[27] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[28] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[29] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[30] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[31] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[32] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[33] D. Hunter,et al. A Tutorial on MM Algorithms , 2004 .
[34] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[35] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[36] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[37] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[38] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.
[39] Wojciech Zaremba,et al. Reinforcement Learning Neural Turing Machines , 2015, ArXiv.
[40] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[41] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[42] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[43] Jürgen Schmidhuber,et al. Recurrent policy gradients , 2010, Log. J. IGPL.
[44] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[45] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.
[46] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[47] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[48] Sergey Levine,et al. Optimism-driven exploration for nonlinear systems , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[49] Max Welling,et al. Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.
[50] Kumpati S. Narendra,et al. Identification and control of dynamical systems using neural networks , 1990, IEEE Trans. Neural Networks.
[51] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[52] Andreas Griewank,et al. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.
[53] Daan Wierstra,et al. Deep AutoRegressive Networks , 2013, ICML.
[54] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[55] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[56] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[57] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[58] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[59] Paul Glasserman,et al. Monte Carlo Methods in Financial Engineering , 2003 .
[60] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[61] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[62] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[63] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[64] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[65] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[66] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[67] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.
[68] Griewank,et al. On automatic differentiation , 1988 .
[69] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[70] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[71] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[72] Filip De Turck,et al. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks , 2016, ArXiv.
[73] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[74] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[75] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[76] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[77] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.
[78] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[79] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[80] K. Wampler,et al. Optimal gait and form for animal locomotion , 2009, SIGGRAPH 2009.
[81] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[82] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[83] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[84] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[85] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[86] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[87] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[88] Nikolaus Hansen,et al. Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.
[89] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[90] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[91] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[92] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[93] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[94] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[95] David Wingate,et al. Automated Variational Inference in Probabilistic Programming , 2013, ArXiv.
[96] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.