Combining Configural and TD Learning on a Robot
暂无分享,去创建一个
[1] R. Rescorla. A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .
[2] J. Gibbon. Scalar expectancy theory and Weber's law in animal timing. , 1977 .
[3] Stephen Grossberg,et al. Neural dynamics of adaptive timing and temporal discrimination during associative learning , 1989, Neural Networks.
[4] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .
[5] J. Pearce. Similarity and discrimination: a selective review and a connectionist model. , 1994, Psychological review.
[6] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[7] JOHN W. Moore,et al. To appear in D.A. Rosenbaum & C.E. Collyer (Eds.), Timing of behavior: Neural, computational, and psychological perspectives. Cambridge, MA: MIT Press Predictive Timing Under Temporal Uncertainty: The TD Model of the Conditioned Response , 1996 .
[8] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[9] A. Kacelnik. Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.
[10] Benjamin Van Roy,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[11] David S. Touretzky,et al. Operant Conditioning in Skinnerbots , 1997, Adapt. Behav..
[12] Bernard Widrow,et al. Perceptrons, adalines, and backpropagation , 1998 .
[13] David S. Touretzky,et al. Behavioral considerations suggest an average reward TD model of the dopamine system , 2000, Neurocomputing.