论文信息 - Combining Configural and TD Learning on a Robot

Combining Configural and TD Learning on a Robot

We combine configural and temporal difference learning in a classical conditioning model.The model is able to solve the negative patterning problem,discriminate sequences of stimuli, and exhibit second order conditioning.We have implemented the algorithm on the Sony AIBO entertainment robot, allowing us to interact with the conditioning model in real time.

E. Tira-Thompson

[1] R. Rescorla. A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .

[2] J. Gibbon. Scalar expectancy theory and Weber's law in animal timing. , 1977 .

[3] Stephen Grossberg,et al. Neural dynamics of adaptive timing and temporal discrimination during associative learning , 1989, Neural Networks.

[4] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[5] J. Pearce. Similarity and discrimination: a selective review and a connectionist model. , 1994, Psychological review.

[6] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[7] JOHN W. Moore,et al. To appear in D.A. Rosenbaum & C.E. Collyer (Eds.), Timing of behavior: Neural, computational, and psychological perspectives. Cambridge, MA: MIT Press Predictive Timing Under Temporal Uncertainty: The TD Model of the Conditioned Response , 1996 .

[8] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[9] A. Kacelnik. Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.

[10] Benjamin Van Roy,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[11] David S. Touretzky,et al. Operant Conditioning in Skinnerbots , 1997, Adapt. Behav..

[12] Bernard Widrow,et al. Perceptrons, adalines, and backpropagation , 1998 .

[13] David S. Touretzky,et al. Behavioral considerations suggest an average reward TD model of the dopamine system , 2000, Neurocomputing.