Does Deep Learning Learn to Abstract? A Systematic Probing Framework

Abstraction is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context. At the same time, there is a lack of clear understanding about both the presence and further characteristics of this capability in deep learning models. In this paper, we introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective. A set of controlled experiments are conducted based on this framework, providing strong evidence that two probed pre-trained language models (PLMs), T5 and GPT2, have the abstraction capability. We also conduct in-depth analysis, thus shedding further light: (1) the whole training phase exhibits a"memorize-then-abstract"two-stage process; (2) the learned abstract concepts are gathered in a few middle-layer attention heads, rather than being evenly distributed throughout the model; (3) the probed abstraction capabilities exhibit robustness against concept mutations, and are more robust to low-level/source-side mutations than high-level/target-side ones; (4) generic pre-training is critical to the emergence of abstraction capability, and PLMs exhibit better abstraction with larger model sizes and data scales.

[1]  Mengnan Du,et al.  Shortcut Learning of Large Language Models in Natural Language Understanding , 2022, Commun. ACM.

[2]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[3]  T. Griffiths,et al.  Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning , 2022, PLoS Comput. Biol..

[4]  Yoshimasa Tsuruoka,et al.  Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models , 2022, ACL.

[5]  Yuri Burda,et al.  Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , 2022, ArXiv.

[6]  Qian Liu,et al.  TAPEX: Table Pre-training via Learning a Neural SQL Executor , 2021, ICLR.

[7]  Aitor Lewkowycz,et al.  Effect of scale on catastrophic forgetting in neural networks , 2022, ICLR.

[8]  W. Zadrozny Abstraction, Reasoning and Deep Learning: A Study of the "Look and Say" Sequence , 2021, ArXiv.

[9]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[10]  Samuel J. Gershman,et al.  Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning , 2021, ArXiv.

[11]  Dongmei Zhang,et al.  Learning Algebraic Recombination for Compositional Generalization , 2021, FINDINGS.

[12]  Dongyan Zhao,et al.  Why Machine Reading Comprehension Models Learn Shortcuts? , 2021, FINDINGS.

[13]  Tyler Millhouse,et al.  Foundations of Intelligence in Natural and Artificial Systems: A Workshop Report , 2021, ArXiv.

[14]  M. Mitchell Abstraction and analogy‐making in artificial intelligence , 2021, Annals of the New York Academy of Sciences.

[15]  Sjoerd van Steenkiste,et al.  Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks , 2020, ICLR.

[16]  Lior Wolf,et al.  Scale-Localized Abstract Reasoning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xianglong Liu,et al.  Stratified Rule-Aware Network for Abstract Visual Reasoning , 2020, AAAI.

[18]  Mark Chen,et al.  Scaling Laws for Autoregressive Generative Modeling , 2020, ArXiv.

[19]  Tal Linzen,et al.  COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.

[20]  Goran Glavas,et al.  Probing Pretrained Language Models for Lexical Semantics , 2020, EMNLP.

[21]  Chen Liang,et al.  Compositional Generalization via Neural-Symbolic Stack Machines , 2020, NeurIPS.

[22]  Samuel R. Bowman,et al.  Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.

[23]  Rico Sennrich,et al.  In Neural Machine Translation, What Does Transfer Learning Transfer? , 2020, ACL.

[24]  Qian Liu,et al.  Compositional Generalization by Learning Analytical Expressions , 2020, NeurIPS.

[25]  Christopher Potts,et al.  Relational reasoning and generalization using non-symbolic neural networks , 2020, CogSci.

[26]  Jonathan Berant,et al.  Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge , 2020, ArXiv.

[27]  Hannaneh Hajishirzi,et al.  UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[28]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[29]  Amir Saffari,et al.  What Do Models Learn from Question Answering Datasets? , 2020, EMNLP.

[30]  B. Lake,et al.  A Benchmark for Systematic Generalization in Grounded Language Understanding , 2020, NeurIPS.

[31]  Oyvind Tafjord,et al.  Transformers as Soft Reasoners over Language , 2020, IJCAI.

[32]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[33]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[34]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[35]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[36]  Ashish Sabharwal,et al.  Probing Natural Language Inference Models through Semantic Fragments , 2019, AAAI.

[37]  George Konidaris,et al.  On the necessity of abstraction , 2019, Current Opinion in Behavioral Sciences.

[38]  John Hewitt,et al.  Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[39]  Shikha Bordia,et al.  Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.

[40]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[41]  Hung-Yu Kao,et al.  Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[42]  Robert Frank,et al.  Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.

[43]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[44]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[45]  Feng Gao,et al.  RAVEN: A Dataset for Relational and Analogical Visual REasoNing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[47]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[48]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[49]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[50]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[51]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[52]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[53]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[55]  Ch. Aswani Kumar,et al.  On the Cognitive Process of Abstraction , 2016 .

[56]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[57]  Jian Sun,et al.  A Practical Transfer Learning Algorithm for Face Verification , 2013, 2013 IEEE International Conference on Computer Vision.

[58]  Marlone D. Henderson,et al.  There Are Many Ways to See the Forest for the Trees , 2013, Perspectives on psychological science : a journal of the Association for Psychological Science.

[59]  A. Tate A measure of intelligence , 2012 .

[60]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[61]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[62]  L. Barsalou,et al.  Whither structured representation? , 1999, Behavioral and Brain Sciences.

[63]  Gerry Altmann,et al.  Mapping across Domains Without Feedback: A Neural Network Model of Transfer of Implicit Knowledge , 1999, Cogn. Sci..

[64]  David Kelley A theory of abstraction. , 1984 .

[65]  C. L. Hull Quantitative aspects of evolution of concepts: An experimental study. , 1920 .