论文信息 - Agents Technology Research

Agents Technology Research

Abstract : This report provides a comprehensive description of three separate efforts pursued by the agents technology research group. The efforts were focused on: state abstraction methods for reinforcement learning, the multi-agent credit assignment problem, and distributed multi-agent reputation management. State abstraction is a technique used to allow machine learning technologies to cope with problems that have large state spaces. This report details the development and analysis of a new algorithm, Reinforcement Learning using State Abstraction via NeuroEvolution (RL-SANE), that utilizes a new technology called neuroevolution to automate the process of state abstraction. The multi-agent credit assignment problem is a situation that arises when multiple learning actors within a domain are only provided with a single global reward signal. Learning is difficult in these scenarios because it is difficult for each agent to determine the value of its contribution to obtaining the global reward. In this report we describe the problem in detail and one specific approach we investigated that uses a Kalman filter to derive local rewards from global rewards. Multi-agent reputation management is important in open domains where the goals or the interests of the agents are diverse and potentially in conflict with one another. Reputation and trust can be used by the agents to determine which other agents in the system it should cooperate with and which it should not. This report details the development of the Affinity Management System (AMS), an approach for managing and learning trust in a distributed fashion that utilizes self-modeling.

[1] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.

[2] Sandip Sen,et al. Robustness of reputation-based trust: boolean case , 2002, AAMAS '02.

[3] Stephen Marsh,et al. Formalising Trust as a Computational Concept , 1994 .

[4] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[5] Jordi Sabater-Mir,et al. REGRET: reputation in gregarious societies , 2001, AGENTS '01.

[6] Lik Mui,et al. A Computational Model of Trust and Reputation for E-businesses , 2002 .

[7] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[8] Kagan Tumer,et al. Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[9] W. Hamilton,et al. The Evolution of Cooperation , 1984 .

[10] Reid G. Simmons,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[11] Leslie Pack Kaelbling,et al. All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[12] Nicholas R. Jennings,et al. The ART of IAM: The Winning Strategy for the 2006 Competition , 2006 .

[13] Siwei Luo,et al. Control Double Inverted Pendulum by Reinforcement Learning with Double CMAC Network , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14] Robert Wright,et al. State Aggregation for Reinforcement Learning using Neuroevolution , 2009, ICAART.

[15] Kagan Tumer,et al. A Survey of Collectives , 2004 .

[16] Shimon Whiteson,et al. Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[17] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[18] Risto Miikkulainen,et al. Robust non-linear control through neuroevolution , 2003 .

[19] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[20] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[21] W. T. Miller,et al. CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.

[22] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[23] John H. Holland,et al. Properties of the Bucket Brigade , 1985, ICGA.

[24] Jordi Sabater-Mir,et al. Review on Computational Trust and Reputation Models , 2005, Artificial Intelligence Review.

[25] Sandip Sen,et al. Reciprocity: a foundational principle for promoting cooperative behavior among self-interested agents , 1996 .

[26] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[27] J. Albus. A Theory of Cerebellar Function , 1971 .

[28] Jeffrey M. Bradshaw,et al. Representing Context for Multiagent Trust Modeling , 2006, 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[29] Andrew W. Moore,et al. Distributed Value Functions , 1999, ICML.

[30] Risto Miikkulainen,et al. 2-D Pole Balancing with Recurrent Evolutionary Networks , 1998 .