Reinforcement Learning by Construction of Hypothetical Targets
暂无分享,去创建一个
[1] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..
[2] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[3] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[4] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.
[5] Frank Fallside,et al. Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .
[6] Rainer Händel,et al. Integrated broadband networks : an introduction to ATM-based networks , 1991 .
[7] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.