Machine Discovery of Comprehensible Strategies for Simple Games Using Meta-interpretive Learning

Recently, world-class human players have been outperformed in a number of complex two-person games (Go, Chess, Checkers) by Deep Reinforcement Learning systems. However, the data efficiency of the learning systems is unclear given that they appear to require far more training games to achieve such performance than any human player might experience in a lifetime. In addition, the resulting learned strategies are not in a form which can be communicated to human players. This contrasts to earlier research in Behavioural Cloning in which single-agent skills were machine learned in a symbolic language, facilitating their being taught to human beings. In this paper, we consider Machine Discovery of human-comprehensible strategies for simple two-person games (Noughts-and-Crosses and Hexapawn). One advantage of considering simple games is that there is a tractable approach to calculating minimax regret. We use these games to compare Cumulative Minimax Regret for variants of both standard and deep reinforcement learning against two variants of a new Meta-interpretive Learning system called MIGO. In our experiments, tested variants of both normal and deep reinforcement learning have consistently worse performance (higher cumulative minimax regret) than both variants of MIGO on Noughts-and-Crosses and Hexapawn. In addition, MIGO’s learned rules are relatively easy to comprehend, and are demonstrated to achieve significant transfer learning in both directions between Noughts-and-Crosses and Hexapawn.

[1]  Andrew Cropper Learning efficient logic programs , 2018, Machine Learning.

[2]  Hiroyuki Iida,et al.  Transfer Learning by Inductive Logic Programming , 2015, ACG.

[3]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[4]  Stephen Muggleton,et al.  Learning Higher-Order Logic Programs through Abstraction and Invention , 2016, IJCAI.

[5]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[6]  Kurt Driessens,et al.  Relational Reinforcement Learning , 1998, Machine-mediated learning.

[7]  T. Niblett,et al.  AUTOMATIC INDUCTION OF CLASSIFICATION RULES FOR A CHESS ENDGAME , 1982 .

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Claude Sammut,et al.  Behavioural clones and cognitive skill models , 1996, Machine Intelligence 14.

[10]  Murray Shanahan,et al.  Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[11]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[12]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[13]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[14]  Donald Michie Experiments on the Mechanization of Game-Learning Part I. Characterization of the Model and its parameters , 1963, Comput. J..

[15]  Stephen Muggleton,et al.  Meta-interpretive learning: application to grammatical inference , 2013, Machine Learning.

[16]  Katsumi Inoue,et al.  Discovering Rules by Meta-level Abduction , 2009, ILP.

[17]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[18]  Stephen Muggleton,et al.  Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited , 2013, Machine Learning.

[19]  Stephen Muggleton,et al.  Bias reformulation for one-shot function induction , 2014, ECAI.

[20]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[21]  Jorge Nuno Silva,et al.  Mathematical Games , 1959, Nature.

[22]  Stephen Muggleton,et al.  Ultra-Strong Machine Learning: comprehensibility of programs learned with ILP , 2018, Machine Learning.

[23]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.