论文信息 - Towards Deep Symbolic Reinforcement Learning

Towards Deep Symbolic Reinforcement Learning

Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very large datasets to work effectively, entailing that they are slow to learn even when such datasets are available. Moreover, they lack the ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans, rendering them unsuitable for domains in which verifiability is important. In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system -- though just a prototype -- learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.

[1] Thomas B. Schön,et al. Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models , 2015, ArXiv.

[2] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .

[3] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[4] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[5] John McCarthy,et al. Generality in artificial intelligence , 1987, CACM.

[6] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[7] Murray Shanahan,et al. Perception as Abduction: Turning Sensor Data Into Meaningful Representation , 2005, Cogn. Sci..

[8] De,et al. Relational Reinforcement Learning , 2022 .

[9] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[10] Ernest Davis,et al. Representations of commonsense knowledge , 2014, notThenot Morgan Kaufmann series in representation and reasoning.

[11] Murray Shanahan,et al. Default Reasoning about Spatial Occupancy , 1995, Artif. Intell..

[12] Murray Shanahan,et al. Classifying Options for Deep Reinforcement Learning , 2016, ArXiv.

[13] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[14] Erik T. Mueller,et al. Commonsense Reasoning: An Event Calculus Based Approach , 2006 .

[15] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17] Patrick J. Hayes,et al. The second naive physics manifesto , 1995 .

[18] Jussi Rintanen,et al. Planning as satisfiability: Heuristics , 2012, Artif. Intell..

[19] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[20] Ke Tang,et al. Relief R-CNN: Utilizing Convolutional Features for Fast Object Detection , 2017, ISNN.

[21] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[22] Joseph Y. Halpern. An Analysis of First-Order Logics of Probability , 1989, IJCAI.

[23] Ke Tang,et al. Relief Impression Image Detection : Unsupervised Extracting Objects Directly From Feature Arrangements of Deep CNN , 2016, ArXiv.

[24] Vítor Santos Costa,et al. Inductive Logic Programming , 2013, Lecture Notes in Computer Science.

[25] Charles Blundell,et al. Early Visual Concept Learning with Unsupervised Deep Learning , 2016, ArXiv.

[26] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[27] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[28] Aziz F. Zambak. The Frame Problem - Autonomy Approach versus Designer Approach , 2011, PT-AI.

[29] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[30] Harri Valpola,et al. Tagger: Deep Unsupervised Perceptual Grouping , 2016, NIPS.

[31] Shie Mannor,et al. Graying the black box: Understanding DQNs , 2016, ICML.

[32] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[33] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Kenneth D. Forbus,et al. Computational models of analogy. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[35] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[36] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[37] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[38] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.