Language Models Use Monotonicity to Assess NPI Licensing

We investigate the semantic knowledge of language models (LMs), focusing on (1) whether these LMs create categories of linguistic environments based on their semantic monotonicity properties, and (2) whether these categories play a similar role in LMs as in human language understanding, using negative polarity item licensing as a case study. We introduce a series of experiments consisting of probing with diagnostic classifiers (DCs), linguistic acceptability tasks, as well as a novel DC ranking method that tightly connects the probing results to the inner workings of the LM. By applying our experimental pipeline to LMs trained on various filtered corpora, we are able to gain stronger insights into the semantic generalizations that are acquired by these models.

[1]  HupkesDieuwke,et al.  Visualisation and ‘diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure , 2018 .

[2]  Any , 2020, Definitions.

[3]  J.F.A.K. van Benthem,et al.  Language in Action: Categories, Lambdas and Dynamic Logic , 1997 .

[4]  Samuel R. Bowman,et al.  Can neural networks acquire a structural bias from raw linguistic data? , 2020, CogSci.

[5]  Ivan Titov,et al.  Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.

[6]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[7]  M. Krifka,et al.  The Semantics and Pragmatics of Polarity Items , 2003 .

[8]  Victor Sanchez,et al.  Studies on Natural Logic and Categorial Grammar , 1991 .

[9]  E. Chemla,et al.  The influence of polarity items on inferential judgments , 2021, Cognition.

[10]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[11]  Emmanuel Chemla,et al.  Modularity and intuitions in formal semantics: the case of polarity items , 2011 .

[12]  Jaap Jumelet,et al.  diagNNose: A Library for Neural Activation Analysis , 2020, BLACKBOXNLP.

[13]  Willem H. Zuidema,et al.  Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Shikha Bordia,et al.  Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Annett Baier,et al.  Logic In Grammar Polarity Free Choice And Intervention , 2016 .

[18]  Utpal Lahiri Focus and Negative Polarity in Hindi , 1998 .

[19]  Dieuwke Hupkes,et al.  Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items , 2018, BlackboxNLP@EMNLP.

[20]  G. Chierchia,et al.  Broaden Your Views: Implicatures of Domain Widening and the Logicality of Language , 2006, Linguistic Inquiry.

[21]  Yonatan Belinkov,et al.  Proceedings of the 2018 EMNLP Workshop BlackboxNLP : Analyzing and Interpreting Neural Networks for NLP , 2018 .

[22]  Lucas Weber,et al.  Language Modelling as a Multi-Task Problem , 2021, EACL.

[23]  Florian Mohnert,et al.  Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information , 2018, BlackboxNLP@EMNLP.

[24]  Tanya Reinhart,et al.  The syntactic domain of anaphora , 1976 .

[25]  Roger P. Levy,et al.  A Systematic Assessment of Syntactic Generalization in Neural Language Models , 2020, ACL.

[26]  Chris Barker,et al.  Negative polarity as scope marking , 2018 .

[27]  Grzegorz Chrupala,et al.  Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop , 2019, Natural Language Engineering.

[28]  Anastasia Giannakidou,et al.  Polarity sensitivity as (non) veridical dependency , 2000 .

[29]  Marco Baroni,et al.  The emergence of number and syntax units in LSTM language models , 2019, NAACL.

[30]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[31]  Roger Levy,et al.  Structural Supervision Improves Learning of Non-Local Grammatical Dependencies , 2019, NAACL.

[32]  John Hewitt,et al.  Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[33]  Gilles Fauconnier,et al.  Polarity and the Scale Principle , 1975 .

[34]  Adam Lopez,et al.  Understanding Learning Dynamics Of Language Models with SVCCA , 2018, NAACL.

[35]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[36]  Jacob Hoeksema,et al.  On the natural history of negative polarity items , 2012 .

[37]  Ellie Pavlick,et al.  Predicting Inductive Biases of Pre-Trained Models , 2021, ICLR.

[38]  Yoav Goldberg,et al.  Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals , 2021, Transactions of the Association for Computational Linguistics.

[39]  Samuel R. Bowman,et al.  BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.

[40]  Noah A. Smith,et al.  Infusing Finetuning with Semantic Dependencies , 2020, Transactions of the Association for Computational Linguistics.

[41]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[42]  William A. Ladusaw Polarity sensitivity as inherent scope relations , 1980 .

[43]  David R. Dowty The Role of Negative Polarity and Concord Marking in Natural Language Reasoning , 1994 .

[44]  Allyson Ettinger,et al.  What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.

[45]  Roger Levy,et al.  Neural language models as psycholinguistic subjects: Representations of syntactic state , 2019, NAACL.

[46]  Michael Israel,et al.  The Grammar of Polarity: Pragmatics, Sensitivity, and the Logic of Scales , 2011 .

[47]  Thomas F. Icard III,et al.  Recent Progress on Monotonicity , 2014, LILT.