Automatic Bridge Bidding Using Deep Reinforcement Learning

Bridge is among the zero-sum games for which artificial intelligence has not yet outperformed expert human players. The main difficulty lies in the bidding phase of bridge, which requires cooperative decision making with partial information. Existing artificial intelligence systems for bridge bidding rely on, and are thus restricted by, human-designed bidding systems or features. In this work, we propose a flexible and pioneering bridge-bidding system, which can learn either with or without the aid of human domain knowledge. The system is based on a novel deep reinforcement learning model, which extracts sophisticated features and learns to bid automatically based on raw card data. The model includes an upper-confidence-bound algorithm and additional techniques to achieve a balance between exploration and exploitation. We further study how different pieces of human knowledge can be exploited to assist the model. Our experiments demonstrate the promising performance of our proposed model. In particular, the model can advance from having no knowledge on bidding to achieving a superior performance compared with a champion-winning computer bridge program that implements a human-designed bidding system. In addition, further synergies can be extracted by incorporating expert knowledge into the proposed model.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[3]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[4]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[5]  Anthony I. Wasserman,et al.  Realization of a skillful bridge bidding program , 1970, AFIPS '70 (Fall).

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  Annika Wærn,et al.  Pragmatic Reasoning in Bridge , 1993 .

[8]  Wei Li,et al.  Exploitation and exploration in a performance based contextual advertising system , 2010, KDD.

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  Shaul Markovitch,et al.  Learning to bid in bridge , 2006, Machine Learning.

[11]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Benjamin Van Roy,et al.  Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.

[14]  Lori L. DeLooze,et al.  Bridge Bidding with Imperfect Information , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[15]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[16]  A. M. Stanier,et al.  BRIBIP: A Bridge Bidding Program , 1975, IJCAI.

[17]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[18]  Colin Raffel,et al.  Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks , 2015, AAAI.

[19]  Matthew L. Ginsberg,et al.  GIB: Steps Toward an Expert-Level Bridge-Playing Program , 1999, IJCAI.

[20]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[21]  Peter Kulchyski and , 2015 .

[22]  Hsuan-Tien Lin,et al.  Contract Bridge Bidding by Learning , 2015, AAAI Workshop: Computer Poker and Imperfect Information.