Simple is Better: Training an End-to-end Contract Bridge Bidding Agent without Human Knowledge

Contract bridge is a multi-player imperfectinformation game where one partnership collaborate with each other to compete against the other partnership. The game consists of two phases: bidding and playing. While playing is relatively easy for modern software, bidding is challenging and requires agents to learn a communication protocol to reach the optimal contract jointly, with their own private information. The agents need to exchange information to their partners, and interfere opponents, through a sequence of actions. In this work, we train a strong agent to bid competitive bridge purely through selfplay, outperforming WBridge5, a championship-winning software. Furthermore, we show that explicitly modeling belief is not necessary in boosting the performance. To our knowledge, this is the first competitive bridge agent that is trained with no domain knowledge. It outperforms previous stateof-the-art that use human replays with 70x fewer number of parameters.

[1]  Hsuan-Tien Lin,et al.  Automatic Bridge Bidding Using Deep Reinforcement Learning , 2016, IEEE Transactions on Games.

[2]  Tao Qin,et al.  Competitive Bridge Bidding with Deep Neural Networks , 2019, AAMAS.

[3]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[4]  Julian Togelius,et al.  AlphaStar: an evolutionary computation perspective , 2019, GECCO.

[5]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[6]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[7]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[8]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[11]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[12]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[13]  Matthew L. Ginsberg,et al.  GIB: Steps Toward an Expert-Level Bridge-Playing Program , 1999, IJCAI.

[14]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[15]  Lisheng Wu,et al.  Learning to Communicate Implicitly by Actions , 2018, AAAI.

[16]  Nicolas Usunier,et al.  Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger , 2018, NeurIPS.

[17]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.