Efficient agents for cliff-edge environments with a large set of decision options

This paper proposes an efficient agent for competing in Cliff Edge (CE) environments, such as sealed-bid auctions, dynamic pricing and the ultimatum game. The agent competes in one-shot CE interactions repeatedly, each time against a different human opponent, and its performance is evaluated based on all the interactions in which it participates. The agent, which learns the general pattern of the population's behavior, does not apply any examples of previous interactions in the environment, neither of other competitors nor its own. We propose a generic approach which competes in different CE environments under the same configuration, with no knowledge about the specific rules of each environment. The underlying mechanism of the proposed agent is a new meta-algorithm, Deviated Virtual Learning (DVL), which extends existing methods to efficiently cope with environments comprising a large number of optional decisions at each decision point. Experiments comparing the performance of the proposed algorithm with algorithms taken from the literature, as well as another intuitive meta-algorithm, reveal a significant superiority of the former in average payoff and stability. In addition, the agent performed better than human competitors executing the same task.

[1]  Werner Güth,et al.  From Ultimatum Bargaining to Dictatorship—an Experimental Study of Four Games Varying in Veto Power , 1997 .

[2]  T. Brenner,et al.  On the Behavior of Proposers in Ultimatum Games , 2003 .

[3]  J. Morgan,et al.  An Analysis of the War of Attrition and the All-Pay Auction , 1997 .

[4]  Steven O. Kimbrough,et al.  Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game , 2002 .

[5]  Nicholas R. Jennings,et al.  Decision procedures for multiple auctions , 2002, AAMAS '02.

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[8]  Ya'akov Gal,et al.  Learning Social Preferences in Games , 2004, AAAI.

[9]  Nicholas R. Jennings,et al.  Sequential auctions for objects with common and private values , 2005, AAMAS '05.

[10]  Brit Grosskopf Reinforcement and Directional Learning in the Ultimatum Game with Responder Competition , 2003 .

[11]  Peter M. Todd,et al.  Designing Socially Intelligent Agents For The Ultimatum Game , 1997 .

[12]  Pattie Maes,et al.  Dynamic pricing strategies under a finite time horizon , 2001, EC '01.

[13]  Benoît Leloup,et al.  Dynamic Pricing on the Internet: Theory and Simulations , 2001, Electron. Commer. Res..

[14]  Lars Niklasson,et al.  An Adaptive 'Rock, Scissors and Paper' Player Based on a Tapped Delay Neural Network , 2001 .

[15]  Nicolaas J. Vriend,et al.  Will reasoning improve learning , 1997 .

[16]  Steven O. Kimbrough,et al.  Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.