Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
暂无分享,去创建一个
Philip S. Thomas | Chris Nota | James E. Kostas | James E. Kostas | P. Thomas | James Kostas | Chris Nota
[1] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[2] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[3] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[4] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.
[5] Andrew G. Barto,et al. Motor primitive discovery , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[6] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[7] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[8] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[9] Paul J. Werbos,et al. Regular Cycles of Forward and Backward Signal Propagation in Prefrontal Cortex and in Consciousness , 2016, Front. Syst. Neurosci..
[10] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[11] Gerald Tesauro,et al. Learning Abstract Options , 2018, NeurIPS.
[12] Xinhua Zhang,et al. Conditional random fields for multi-agent reinforcement learning , 2007, ICML '07.
[13] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[14] Lakhmi C. Jain,et al. Innovations in Multi-Agent Systems and Applications - 1 , 2010 .
[15] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[16] D. Bertsekas. Gradient convergence in gradient methods , 1997 .
[17] Andrew G. Barto,et al. Conjugate Markov Decision Processes , 2011, ICML.
[18] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[19] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[20] Richard L. Lewis,et al. Optimal Rewards for Cooperative Agents , 2014, IEEE Transactions on Autonomous Mental Development.
[21] Bart De Schutter,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .
[22] Philip S. Thomas,et al. Policy Gradient Coagent Networks , 2011, NIPS.
[23] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.