Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
暂无分享,去创建一个
[1] M. A. L. THATHACHAR,et al. A new approach to the design of reinforcement schemes for learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[2] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[3] Richard S. Sutton,et al. Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.
[4] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[5] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[6] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[7] K. Narendra,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.
[8] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.
[9] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.
[10] Keith Price,et al. Review of "Principles of Artificial Intelligence by Nils J. Nilsson", Tioga Publishing Company, Palo Alto, CA, ISBN 0-935382-01-1. , 1980, SGAR.
[11] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .
[13] Yann LeCun,et al. Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .
[14] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[15] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[16] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .
[17] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[18] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[19] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[20] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[21] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[22] Kumpati S. Narendra,et al. An N-player sequential stochastic game with identical payoffs , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[23] Robert J. Beaver,et al. An Introduction to Probability Theory and Mathematical Statistics , 1977 .
[24] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .
[25] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[26] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[27] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.