Learning Intelligent Behavior in a Non-stationary and Partially Observable Environment

Individual learning in an environment where more than one agent exist is a chal-lengingtask. In this paper, a single learning agent situated in an environment where multipleagents exist is modeled based on reinforcement learning. The environment is non-stationaryand partially accessible from an agents' point of view. Therefore, learning activities of anagent is influenced by actions of other cooperative or competitive agents in the environment.A prey-hunter capture game that has the above characteristics is defined and experimentedto simulate the learning process of individual agents. Experimental results show that thereare no strict rules for reinforcement learning. We suggest two new methods to improve theperformance of agents. These methods decrease the number of states while keeping as muchstate as necessary.

[1]  Reda Alhajj,et al.  Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  R. Bellman Dynamic programming. , 1957, Science.

[4]  Yves Kodratoff Introduction to machine learning , 1988 .

[5]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[6]  J. Deese The psychology of learning , 1952 .

[7]  Gerhard Weiss,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[8]  C. Burt THE PSYCHOLOGY OF LEARNING , 1958 .

[9]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[12]  Learning Theory and Mental Development, W.K. Estes. Academic Press, New York (1970), X+223 Pp. $9.50. , 1971 .

[13]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[15]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[16]  Gerhard Weiß,et al.  Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.

[17]  Shashi Shekhar,et al.  A Negotiation Platform for Cooperating Multi-agent Systems , 1993 .

[18]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[21]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[22]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[23]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[24]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[25]  Faruk Polat,et al.  A Conflict Resolution-Based Decentralized Multi-Agent Problem Solving Model , 1992, MAAMAW.