论文信息 - Does Deep Learning Learn to Abstract? A Systematic Probing Framework

Does Deep Learning Learn to Abstract? A Systematic Probing Framework

Abstraction is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context. At the same time, there is a lack of clear understanding about both the presence and further characteristics of this capability in deep learning models. In this paper, we introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective. A set of controlled experiments are conducted based on this framework, providing strong evidence that two probed pre-trained language models (PLMs), T5 and GPT2, have the abstraction capability. We also conduct in-depth analysis, thus shedding further light: (1) the whole training phase exhibits a"memorize-then-abstract"two-stage process; (2) the learned abstract concepts are gathered in a few middle-layer attention heads, rather than being evenly distributed throughout the model; (3) the probed abstraction capabilities exhibit robustness against concept mutations, and are more robust to low-level/source-side mutations than high-level/target-side ones; (4) generic pre-training is critical to the emergence of abstraction capability, and PLMs exhibit better abstraction with larger model sizes and data scales.

[1] Mengnan Du,et al. Shortcut Learning of Large Language Models in Natural Language Understanding , 2022, Commun. ACM.

[2] J. Dean,et al. Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[3] T. Griffiths,et al. Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning , 2022, PLoS Comput. Biol..

[4] Yoshimasa Tsuruoka,et al. Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models , 2022, ACL.

[5] Yuri Burda,et al. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , 2022, ArXiv.

[6] Qian Liu,et al. TAPEX: Table Pre-training via Learning a Neural SQL Executor , 2021, ICLR.

[7] Aitor Lewkowycz,et al. Effect of scale on catastrophic forgetting in neural networks , 2022, ICLR.

[8] W. Zadrozny. Abstraction, Reasoning and Deep Learning: A Study of the "Look and Say" Sequence , 2021, ArXiv.

[9] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[10] Samuel J. Gershman,et al. Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning , 2021, ArXiv.

[11] Dongmei Zhang,et al. Learning Algebraic Recombination for Compositional Generalization , 2021, FINDINGS.

[12] Dongyan Zhao,et al. Why Machine Reading Comprehension Models Learn Shortcuts? , 2021, FINDINGS.

[13] Tyler Millhouse,et al. Foundations of Intelligence in Natural and Artificial Systems: A Workshop Report , 2021, ArXiv.

[14] M. Mitchell. Abstraction and analogy‐making in artificial intelligence , 2021, Annals of the New York Academy of Sciences.

[15] Sjoerd van Steenkiste,et al. Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks , 2020, ICLR.

[16] Lior Wolf,et al. Scale-Localized Abstract Reasoning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Xianglong Liu,et al. Stratified Rule-Aware Network for Abstract Visual Reasoning , 2020, AAAI.

[18] Mark Chen,et al. Scaling Laws for Autoregressive Generative Modeling , 2020, ArXiv.

[19] Tal Linzen,et al. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.

[20] Goran Glavas,et al. Probing Pretrained Language Models for Lexical Semantics , 2020, EMNLP.

[21] Chen Liang,et al. Compositional Generalization via Neural-Symbolic Stack Machines , 2020, NeurIPS.

[22] Samuel R. Bowman,et al. Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.

[23] Rico Sennrich,et al. In Neural Machine Translation, What Does Transfer Learning Transfer? , 2020, ACL.

[24] Qian Liu,et al. Compositional Generalization by Learning Analytical Expressions , 2020, NeurIPS.

[25] Christopher Potts,et al. Relational reasoning and generalization using non-symbolic neural networks , 2020, CogSci.

[26] Jonathan Berant,et al. Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge , 2020, ArXiv.

[27] Hannaneh Hajishirzi,et al. UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[28] M. Bethge,et al. Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[29] Amir Saffari,et al. What Do Models Learn from Question Answering Datasets? , 2020, EMNLP.

[30] B. Lake,et al. A Benchmark for Systematic Generalization in Grounded Language Understanding , 2020, NeurIPS.

[31] Oyvind Tafjord,et al. Transformers as Soft Reasoners over Language , 2020, IJCAI.

[32] Xiao Wang,et al. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[33] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[34] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[35] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[36] Ashish Sabharwal,et al. Probing Natural Language Inference Models through Semantic Fragments , 2019, AAAI.

[37] George Konidaris,et al. On the necessity of abstraction , 2019, Current Opinion in Behavioral Sciences.

[38] John Hewitt,et al. Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[39] Shikha Bordia,et al. Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.

[40] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.

[41] Hung-Yu Kao,et al. Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[42] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.

[43] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[44] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[45] Feng Gao,et al. RAVEN: A Dataset for Relational and Analogical Visual REasoNing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[47] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[48] Felix Hill,et al. Measuring abstract reasoning in neural networks , 2018, ICML.

[49] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[50] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[51] Ronald Kemker,et al. Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[52] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[53] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[55] Ch. Aswani Kumar,et al. On the Cognitive Process of Abstraction , 2016 .

[56] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[57] Jian Sun,et al. A Practical Transfer Learning Algorithm for Face Verification , 2013, 2013 IEEE International Conference on Computer Vision.

[58] Marlone D. Henderson,et al. There Are Many Ways to See the Forest for the Trees , 2013, Perspectives on psychological science : a journal of the Association for Psychological Science.

[59] A. Tate. A measure of intelligence , 2012 .

[60] Wiebke Wagner,et al. Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[61] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[62] L. Barsalou,et al. Whither structured representation? , 1999, Behavioral and Brain Sciences.

[63] Gerry Altmann,et al. Mapping across Domains Without Feedback: A Neural Network Model of Transfer of Implicit Knowledge , 1999, Cogn. Sci..

[64] David Kelley. A theory of abstraction. , 1984 .

[65] C. L. Hull. Quantitative aspects of evolution of concepts: An experimental study. , 1920 .