Looking Back on the Actor–Critic Architecture
暂无分享,去创建一个
[1] J. Stevens,et al. Animal Intelligence , 1883, Nature.
[2] Donald Michie. Experiments on the Mechanization of Game-Learning Part I. Characterization of the Model and its parameters , 1963, Comput. J..
[3] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[4] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..
[5] M. L. Tsetlin,et al. Automaton theory and modeling of biological systems , 1973 .
[6] E Harth,et al. Alopex: a stochastic method for determining visual receptive fields. , 1974, Vision research.
[7] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[8] Stephen A. Ritz,et al. Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .
[9] Teuvo Kohonen,et al. Associative memory. A system-theoretical approach , 1977 .
[10] J. Gittins,et al. A dynamic allocation index for the discounted multiarmed bandit problem , 1979 .
[11] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[12] Richard S. Sutton,et al. Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .
[13] R. Sutton,et al. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.
[14] Wg Lehnert,et al. THE HEDONISTIC NEURON - A THEORY OF MEMORY, LEARNING, AND INTELLIGENCE - KLOPF,AH , 1983 .
[15] Kumpati S. Narendra,et al. An N-player sequential stochastic game with identical payoffs , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[16] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[17] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.
[18] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[19] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[20] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .
[21] P. Anandan,et al. Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata , 1986 .
[22] Andrew G. Barto,et al. Game-theoretic cooperativity in networks of self-interested units , 1987 .
[23] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[24] O. G. Selfridge,et al. Pandemonium: a paradigm for learning , 1988 .
[25] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[26] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
[27] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
[28] A. Barto,et al. Adaptive Critics and the Basal Ganglia , 1994 .
[29] Joel L. Davis,et al. Adaptive Critics and the Basal Ganglia , 1995 .
[30] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[31] Charles W. Anderson,et al. Approximating a Policy Can be Easier Than Approximating a Value Function , 2000 .
[32] Derong Liu,et al. Action-dependent adaptive critic designs , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).
[33] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[34] Richard S. Sutton,et al. Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.
[35] S.-I. Amari,et al. Neural theory of association and concept-formation , 1977, Biological Cybernetics.
[36] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[37] E. Harth,et al. The Alopex process: Visual receptive fields by response feedback , 1979, Biological Cybernetics.
[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[39] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[40] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[41] A. Cooper,et al. Predictive Reward Signal of Dopamine Neurons , 2011 .
[42] W. Ashby,et al. Design for a brain; the origin of adaptive behavior , 2011 .
[43] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .
[44] D. Newnham. Trial and error. , 2013, Nursing standard (Royal College of Nursing (Great Britain) : 1987).
[45] Donald Michie,et al. BOXES: AN EXPERIMENT IN ADAPTIVE CONTROL , 2013 .
[46] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[47] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[48] C. Robert. Superintelligence: Paths, Dangers, Strategies , 2017 .
[49] M. Mohri,et al. Bandit Problems , 2006 .
[50] Peter W. Hawkins. Distinctive features , 2018, Introducing Phonology.