Acquisition of Chess Knowledge in AlphaZero

What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest. If the representations of strong neural networks bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability. In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network. We also provide a behavioural analysis focusing on opening play, including qualitative analysis from chess Grandmaster Vladimir Kramnik. Finally, we carry out a preliminary investigation looking at the low-level details of AlphaZero’s representations, and make the resulting behavioural and representational analyses available online.

[1]  Stefano Ermon,et al.  A Theory of Usable Information Under Computational Constraints , 2020, ICLR.

[2]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[3]  Henning Müller,et al.  Regression Concept Vectors for Bidirectional Explanations in Histopathology , 2018, MLCN/DLF/iMIMIC@MICCAI.

[4]  Eduard Hovy,et al.  Chess Q & A : Question Answering on Chess Games , 2015 .

[5]  Ari S. Morcos,et al.  Towards falsifiable interpretability research , 2020, ArXiv.

[6]  Harsh Jhamtani,et al.  Learning to Generate Move-by-Move Commentary for Chess Games from Large-Scale Social Forum Data , 2018, ACL.

[7]  M. Gervasio,et al.  Interestingness Elements for Explainable Reinforcement Learning: Understanding Agents' Capabilities and Limitations , 2019, Artif. Intell..

[8]  James L. Crowley,et al.  Deep learning investigation for chess player attention prediction using eye-tracking and game data , 2019, ETRA.

[9]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[11]  K. Vaesen,et al.  Robustness analysis , 2005 .

[12]  Gerald Friedland,et al.  Efficient Saliency Maps for Explainable AI , 2019, ArXiv.

[13]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Ulrich Paquet,et al.  Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess , 2020, ArXiv.

[15]  Tiago Pimentel,et al.  A Bayesian Framework for Information-Theoretic Probing , 2021, EMNLP.

[16]  Ilkay Öksüz,et al.  Global and Local Interpretability for Cardiac MRI Classification , 2019, MICCAI.

[17]  Yaakov HaCohen-Kerner Learning Strategies for Explanation Patterns: Basic Game Patterns with Application to Chess , 1995, ICCBR.

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  James Zou,et al.  Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[20]  Alan Fern,et al.  Explainable Reinforcement Learning via Reward Decomposition , 2019 .

[21]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[22]  Ivan Titov,et al.  Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Finale Doshi-Velez,et al.  Promises and Pitfalls of Black-Box Concept Learning Models , 2021, ArXiv.

[25]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[26]  Artur S. d'Avila Garcez,et al.  Towards Symbolic Reinforcement Learning with Common Sense , 2018, ArXiv.

[27]  David Filliat,et al.  DisCoRL: Continual Reinforcement Learning via Policy Distillation , 2019, ArXiv.

[28]  Siddhartha Sen,et al.  Aligning Superhuman AI with Human Behavior: Chess as a Model System , 2020, KDD.

[29]  Yatsuka Nakamura,et al.  Introduction to Circuits, I , 2000 .

[30]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[31]  Natalia Díaz Rodríguez,et al.  Explainability in Deep Reinforcement Learning , 2020, Knowl. Based Syst..

[32]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[33]  Martin Schels,et al.  Concept Enforcement and Modularization as Methods for the ISO 26262 Safety Argumentation of Neural Networks , 2020 .

[34]  Mohammad Taha Bahadori,et al.  Debiasing Concept-based Explanations with Causal Analysis , 2020, ICLR.

[35]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[36]  Alec Radford,et al.  Multimodal Neurons in Artificial Neural Networks , 2021 .

[37]  Ludovic Denoyer,et al.  EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction , 2019, ArXiv.

[38]  Subbarao Kambhampati,et al.  Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators , 2020, ArXiv.

[39]  David Filliat,et al.  Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics , 2018, ArXiv.

[40]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[41]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[42]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[43]  Siddhartha Sen,et al.  Learning Personalized Models of Human Behavior in Chess , 2020, ArXiv.

[44]  Volker Gruhn,et al.  Domain-Level Explainability - A Challenge for Creating Trust in Superhuman AI Strategies , 2020, ArXiv.

[45]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[46]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[47]  Omer Levy,et al.  Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.

[48]  Hiroshi Kawano,et al.  Hierarchical sub-task decomposition for reinforcement learning of multi-robot delivery mission , 2013, 2013 IEEE International Conference on Robotics and Automation.

[49]  Tess Berthier,et al.  UBS: A Dimension-Agnostic Metric for Concept Vector Interpretability Applied to Radiomics , 2019, iMIMIC/ML-CDS@MICCAI.

[50]  David Filliat,et al.  S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning , 2018, ArXiv.

[51]  Yaakov HaCohen-Kerner,et al.  Case-Based Evaluation in Computer Chess , 1994, EWCBR.

[52]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[53]  Tor Lattimore,et al.  Gated Linear Networks , 2019, ArXiv.

[54]  Subbarao Kambhampati,et al.  TLdR: Policy Summarization for Factored SSP Problems Using Temporal Abstractions , 2020, ICAPS.

[55]  Martin Wattenberg,et al.  Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making , 2019, CHI.

[56]  Francois Fleuret,et al.  Gradient Alignment in Deep Neural Networks , 2020, ArXiv.

[57]  Murray Shanahan,et al.  Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[58]  Ilaria Liccardi,et al.  Debugging Tests for Model Explanations , 2020, NeurIPS.

[59]  Jacob Hilton,et al.  Understanding RL vision , 2020 .

[60]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[61]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[62]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[63]  Yash Goyal,et al.  Explaining Classifiers with Causal Concept Effect (CaCE) , 2019, ArXiv.

[64]  K. Kersting,et al.  Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data , 2019, Frontiers in Artificial Intelligence.

[65]  C. Rudin,et al.  Concept whitening for interpretable image recognition , 2020, Nature Machine Intelligence.

[66]  Nicholas McCarthy,et al.  SentiMATE: Learning to play Chess through Natural Language Processing , 2019, ArXiv.

[67]  Shaobo Hou,et al.  Concept-based model explanations for electronic health records , 2020, CHIL.

[68]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Alan Fern,et al.  Learning Finite State Representations of Recurrent Policy Networks , 2018, ICLR.

[70]  Richard J. Duro,et al.  Open-Ended Learning: A Conceptual Framework Based on Representational Redescription , 2018, Front. Neurorobot..

[71]  Garry Kasparov,et al.  Chess, a Drosophila of reasoning , 2018, Science.

[72]  Romain Laroche,et al.  Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[73]  Tommi S. Jaakkola,et al.  On the Robustness of Interpretability Methods , 2018, ArXiv.

[74]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[75]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[76]  Richard J. Duro,et al.  DREAM Architecture: a Developmental Approach to Open-Ended Learning in Robotics , 2020, ArXiv.

[77]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[78]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[79]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.