暂无分享,去创建一个
[1] D. Bertsekas. Gradient convergence in gradient methods , 1997 .
[2] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[3] Richard L. Lewis,et al. Optimal Rewards for Cooperative Agents , 2014, IEEE Transactions on Autonomous Mental Development.
[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[5] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[6] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.
[7] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[8] Andrew G. Barto,et al. Motor primitive discovery , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[9] OpenAI. Learning Dexterous In-Hand Manipulation. , 2018 .
[10] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[11] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[12] Bart De Schutter,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .
[13] Philip S. Thomas,et al. Policy Gradient Coagent Networks , 2011, NIPS.
[14] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[15] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[16] Andrew G. Barto,et al. Conjugate Markov Decision Processes , 2011, ICML.
[17] Paul J. Werbos,et al. Regular Cycles of Forward and Backward Signal Propagation in Prefrontal Cortex and in Consciousness , 2016, Front. Syst. Neurosci..
[18] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..