论文信息 - Learning and Exploiting Relative Weaknesses of Opponent Agents

Learning and Exploiting Relative Weaknesses of Opponent Agents

Agents in a competitive interaction can greatly benefit from adapting to a particular adversary, rather than using the same general strategy against all opponents. One method of such adaptation isOpponent Modeling, in which a model of an opponent is acquired and utilized as part of the agent’s decision procedure in future interactions with this opponent. However, acquiring an accurate model of a complex opponent strategy may be computationally infeasible. In addition, if the learned model is not accurate, then using it to predict the opponent’s actions may potentially harm the agent’s strategy rather than improving it. We thus define the concept ofopponent weakness, and present a method for learning a model of this simpler concept. We analyze examples of past behavior of an opponent in a particular domain, judging its actions using a trusted judge. We then infer aweakness model based on the opponent’s actions relative to the domain state, and incorporate this model into our agent’s decision procedure. We also make use of a similar self-weakness model, allowing the agent to prefer states in which the opponent is weak and our agent strong; where we have arelative advantage over the opponent. Experimental results spanning two different test domains demonstrate the agents’ improved performance when making use of the weakness models.

Shaul Markovitch | Ronit Reger | Shaul Markovitch | R. Reger

[1] Bruce W. Ballard,et al. Non-Minimax Search Strategies for Use Against Fallible Opponents , 1983, AAAI.

[2] David Carmel,et al. Learning and using opponent models in adversary search , 1996 .

[3] Brett Browning,et al. Multi-robot team response to a multi-robot opponent team , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[4] H. Simon,et al. Models of Bounded Rationality: Empirically Grounded Economic Reason , 1997 .

[5] Larry A. Rendell,et al. Constructive Induction On Decision Trees , 1989, IJCAI.

[6] Hiroyuki Iida,et al. Performance of (D,d)-OM Search , 1997 .

[7] Alfred Kobsa. User Modeling and User-Adapted Interaction , 2005, User Modeling and User-Adapted Interaction.

[8] David Carmel,et al. Model-based learning of interaction strategies in multi-agent systems , 1998, J. Exp. Theor. Artif. Intell..

[9] Ronitt Rubinfeld,et al. Efficient algorithms for learning to play repeated games against computationally bounded adversaries , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[10] DANA ANGLUIN,et al. On the Complexity of Minimum Inference of Regular Sets , 1978, Inf. Control..

[11] Edmund H. Durfee,et al. Using Recursive Agent Models Effectively , 1995, ATAL.

[12] Shaul Markovitch,et al. Learning Models of Opponent's Strategy Game Playing , 1993 .

[13] H. Jaap van den Herik,et al. Probabilistic opponent-model search , 2001, Inf. Sci..

[14] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[15] Hiroyuki Iida,et al. Potential Applications of Opponent-Model Search , 1994, J. Int. Comput. Games Assoc..

[16] Peter Stone,et al. Modeling Auction Price Uncertainty Using Boosting-based Conditional Density Estimation , 2002, ICML.

[17] David Carmel,et al. Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies , 1999, Autonomous Agents and Multi-Agent Systems.

[18] Dana S. Nau,et al. An Investigation of the Causes of Pathology in Games , 1982, Artif. Intell..

[19] Stuart J. Russell,et al. Do the right thing - studies in limited rationality , 1991 .

[20] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[21] H. Simon,et al. Models of Bounded Rationality, Volume 1: Economic Analysis and Public Policy , 1982 .

[22] Leonid Sheremetov,et al. Weiss, Gerhard. Multiagent Systems a Modern Approach to Distributed Artificial Intelligence , 2009 .

[23] Yuh-Jyh Hu,et al. Generation of Attributes for Learning Algorithms , 1996, AAAI/IAAI, Vol. 1.

[24] Peter Stone and Patrick Riley and Manuela Veloso. Defining and Using Ideal Teammate and Opponent Models , 2000 .

[25] Sandip Sen,et al. Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[26] Edmund H. Durfee,et al. A Rigorous, Operational Formalization of Recursive Modeling , 1995, ICMAS.

[27] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[28] Hiroyuki Iida,et al. Potential Applications of Opponent-Model Search , 1994, J. Int. Comput. Games Assoc..

[29] Jonathan Schaeffer,et al. Computers, Chess, and Cognition , 2011, Springer New York.

[30] Romeo Çollaku,et al. Deep thought , 1991, Nature.

[31] Robert H. Crites,et al. Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[32] Hiroyuki Iida,et al. Strategies anticipating a difference in search depth using opponent-model search , 2001, Theor. Comput. Sci..

[33] Jonathan Schaeffer,et al. Opponent Modeling in Poker , 1998, AAAI/IAAI.

[34] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[35] Hiroyuki Iida,et al. A Speculative Strategy , 1998, Computers and Games.

[36] L. Victor Allis,et al. A Knowledge-Based Approach of Connect-Four , 1988, J. Int. Comput. Games Assoc..

[37] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[38] Jonathan Schaeffer,et al. Search Versus Knowledge in Game-Playing Programs Revisited , 1997, IJCAI.

[39] Judea Pearl,et al. On the Nature of Pathology in Game Searching , 1983, Artif. Intell..

[40] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[41] David Carmel,et al. Incorporating Opponent Models into Adversary Search , 1996, AAAI/IAAI, Vol. 1.

[42] Gerhard Weiß,et al. Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.

[43] David Carmel,et al. Learning Models of Intelligent Agents , 1996, AAAI/IAAI, Vol. 1.

[44] Shaul Markovitch,et al. LEARNING OF RESOURCE ALLOCATION STRATEGIES FOR GAME PLAYING , 1993, IJCAI.

[45] Claudia V. Goldman,et al. Learn Your Opponent's Strategy (in Polynominal Time)! , 1995, Adaption and Learning in Multi-Agent Systems.

[46] Gerhard Weiss,et al. Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[47] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[48] Dana S. Nau. Pathology on Game Trees: A Summary of Results , 1980, AAAI.

[49] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[50] Neeraj Arora,et al. LEARNING TO TAKE RISKS , 1992 .

[51] E. Durfee,et al. The Impact of Nested Agent Models in an Information Economy , 1996 .

[52] Sandip Sen,et al. Learning in multiagent systems , 1999 .

[53] Shaul Markovitch,et al. Feature Generation Using General Constructor Functions , 2002, Machine Learning.

[54] Peter J. Jansen,et al. Using knowledge about the opponent in game-tree search , 1992 .

[55] Sanguk Noh,et al. Bayesian Update of Recursive Agent Models , 2004, User Modeling and User-Adapted Interaction.

[56] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[57] Shlomo Zilberstein. Optimizing Decision Quality with Contract Algorithms , 1995, IJCAI.

[58] I. Kononenko,et al. INDUCTION OF DECISION TREES USING RELIEFF , 1995 .

[59] D. Haussler,et al. Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[60] Guillermo Ricardo Simari,et al. Multiagent systems: a modern approach to distributed artificial intelligence , 2000 .

[61] Simon Parsons,et al. Do the right thing - studies in limited rationality by Stuart Russell and Eric Wefald, MIT Press, Cambridge, MA, £24.75, ISBN 0-262-18144-4 , 1994, The Knowledge Engineering Review.