Efficient bidding strategies for Cliff-Edge problems

In this paper, we propose an efficient agent for competing in Cliff-Edge (CE) and simultaneous Cliff-Edge (SCE) situations. In CE interactions, which include common interactions such as sealed-bid auctions, dynamic pricing and the ultimatum game (UG), the probability of success decreases monotonically as the reward for success increases. This trade-off exists also in SCE interactions, which include simultaneous auctions and various multi-player ultimatum games, where the agent has to decide about more than one offer or bid simultaneously. Our agent competes repeatedly in one-shot interactions, each time against different human opponents. The agent learns the general pattern of the population’s behavior, and its performance is evaluated based on all of the interactions in which it participates. We propose a generic approach which may help the agent compete against unknown opponents in different environments where CE and SCE interactions exist, where the agent has a relatively large number of alternatives and where its achievements in the first several dozen interactions are important. The underlying mechanism we propose for CE interactions is a new meta-algorithm, deviated virtual learning (DVL), which extends existing methods to efficiently cope with environments comprising a large number of alternative decisions at each decision point. Another competitive approach is the Bayesian approach, which learns the opponents’ statistical distribution, given prior knowledge about the type of distribution. For the SCE, we propose the simultaneous deviated virtual reinforcement learning algorithm (SDVRL), the segmentation meta-algorithm as a method for extending different basic algorithms, and a heuristic called fixed success probabilities (FSP). Experiments comparing the performance of the proposed algorithms with algorithms taken from the literature, as well as other intuitive meta-algorithms, reveal superiority of the proposed algorithms in average payoff and stability as well as in accuracy in converging to the optimal action, both in CE and SCE problems.

[1]  Xianyu Bo Social Preference, Incomplete Information, and the Evolution of Ultimatum Game in the Small World Networks: An Agent-Based Approach , 2010, J. Artif. Soc. Soc. Simul..

[2]  Craig Boutilier,et al.  Sequential Auctions for the Allocation of Resources with Complementarities , 1999, IJCAI.

[3]  Brit Grosskopf Reinforcement and Directional Learning in the Ultimatum Game with Responder Competition , 2003 .

[4]  Sarit Kraus,et al.  An Automated Agent for Bilateral Negotiation with Bounded Rational Agents with Incomplete Information , 2006, ECAI.

[5]  Ya'akov Gal,et al.  Learning Social Preferences in Games , 2004, AAAI.

[6]  Peter M. Todd,et al.  Designing Socially Intelligent Agents For The Ultimatum Game , 1997 .

[7]  Colin F. Camerer,et al.  Outside Options and Social Comparison in Three-Player Ultimatum Game Experiments , 1995 .

[8]  Sarit Kraus,et al.  Efficient Bidding Strategies for Simultaneous Cliff-Edge Environments , 2006, 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[9]  Pattie Maes,et al.  Artificial life meets entertainment: lifelike autonomous agents , 1995, CACM.

[10]  Jonathan Schaeffer,et al.  Improved Opponent Modeling in Poker , 2000 .

[11]  Sarit Kraus,et al.  Gender-Sensitive Automated Negotiators , 2007, AAAI.

[12]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[13]  J. Morgan,et al.  An Analysis of the War of Attrition and the All-Pay Auction , 1997 .

[14]  Lars Niklasson,et al.  An Adaptive 'Rock, Scissors and Paper' Player Based on a Tapped Delay Neural Network , 2001 .

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  W. Güth,et al.  An experimental analysis of ultimatum bargaining , 1982 .

[17]  Sarit Kraus,et al.  Modeling Agents through Bounded Rationality Theories , 2009, IJCAI.

[18]  T. Brenner,et al.  On the Behavior of Proposers in Ultimatum Games , 2003 .

[19]  Steven O. Kimbrough,et al.  Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[20]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[21]  Ashish Goel,et al.  Truthful auctions for pricing search keywords , 2006, EC '06.

[22]  Nicholas R. Jennings,et al.  Sequential auctions for objects with common and private values , 2005, AAMAS '05.

[23]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[24]  Guido Governatori,et al.  Probabilistic Automated Bidding in Multiple Auctions , 2005, Electron. Commer. Res..

[25]  Nicholas R. Jennings,et al.  Decision procedures for multiple auctions , 2002, AAMAS '02.

[26]  Steven O. Kimbrough,et al.  Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game , 2002 .

[27]  Nicholas R. Jennings,et al.  Heuristic Bidding Strategies for Multiple Heterogeneous Auctions , 2006, ECAI.

[28]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[29]  Sarit Kraus,et al.  Efficient agents for cliff-edge environments with a large set of decision options , 2006, AAMAS '06.

[30]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[31]  Steven Minton,et al.  Minimizing Conflicts: A Heuristic Repair Method for Constraint Satisfaction and Scheduling Problems , 1992, Artif. Intell..

[32]  Nicholas R. Jennings,et al.  Bidding optimally in concurrent second-price auctions of perfectly substitutable goods , 2007, AAMAS '07.

[33]  S. Gächter Behavioral Game Theory , 2008, Encyclopedia of Evolutionary Psychological Science.

[34]  Vincent Conitzer,et al.  Learning algorithms for online principal-agent problems (and selling goods online) , 2006, ICML.

[35]  Benoît Leloup,et al.  Dynamic Pricing on the Internet: Theory and Simulations , 2001, Electron. Commer. Res..

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  Thomas T. Tran,et al.  Dynamic Pricing in Electronic Commerce Using Neural Network , 2009, MCETECH.

[38]  Nicholas R. Jennings,et al.  Learning to be Competitive in the Market , 1999, AAAI 1999.

[39]  Werner Güth,et al.  From Ultimatum Bargaining to Dictatorship—an Experimental Study of Four Games Varying in Veto Power , 1997 .

[40]  R. Brent Table errata: Algorithms for minimization without derivatives (Prentice-Hall, Englewood Cliffs, N. J., 1973) , 1975 .

[41]  Rina Azoulay-Schwartz,et al.  Exploitation vs. exploration: choosing a supplier in an environment of incomplete information , 2004, Decis. Support Syst..

[42]  Pattie Maes,et al.  Dynamic pricing strategies under a finite time horizon , 2001, EC '01.

[43]  Quinn McNemar,et al.  Statistical Analysis in Psychology and Education. , 1967 .

[44]  A. Shaked,et al.  Testing non-cooperative bargaining theory: a preliminary study , 1985 .

[45]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[46]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[47]  Nicolaas J. Vriend,et al.  Will reasoning improve learning , 1997 .

[48]  Peter R. Wurman,et al.  Structural leverage and fictitious play in sequential auctions , 2002, AAAI/IAAI.

[49]  Ruy Luiz Milidiú,et al.  SIMPLE - a multi-agent system for simultaneous and related auctions , 2003, IEEE/WIC International Conference on Intelligent Agent Technology, 2003. IAT 2003..

[50]  Amy Greenwald,et al.  Bidding algorithms for simultaneous auctions , 2001, EC '01.