Harnessing the linguistic signal to predict scalar inferences

Pragmatic inferences often subtly depend on the presence or absence of linguistic features. For example, the presence of a partitive construction (of the) increases the strength of a so-called scalar inference: listeners perceive the inference that Chris did not eat all of the cookies to be stronger after hearing "Chris ate some of the cookies" than after hearing the same utterance without a partitive, "Chris ate some cookies." In this work, we explore to what extent neural network sentence encoders can learn to predict the strength of scalar inferences. We first show that an LSTM-based sentence encoder trained on an English dataset of human inference strength ratings is able to predict ratings with high accuracy (r=0.78). We then probe the model's behavior using manually constructed minimal sentence pairs and corpus data. We find that the model inferred previously established associations between linguistic features and inference strength, suggesting that the model learns to use linguistic features to predict pragmatic inferences.

[1]  Yang Liu,et al.  Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[2]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[3]  Christopher Potts,et al.  Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding , 2017, TACL.

[4]  Judith Degen,et al.  Investigating the distribution of some (but not all ) implicatures using corpora and web-based methods , 2015 .

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[7]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Christopher Potts,et al.  Pragmatically Informative Image Captioning with Character-Level Inference , 2018, NAACL.

[9]  Marie-Catherine de Marneffe,et al.  Do You Know That Florence Is Packed with Visitors? Evaluating State-of-the-art Models of Speaker Commitment , 2019, ACL.

[10]  A. Feeney,et al.  When some is actually all: Scalar inferences in face-threatening contexts , 2009, Cognition.

[11]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[12]  Greg Carlson,et al.  A unified analysis of the English bare plural , 1977 .

[13]  Dan Klein,et al.  Reasoning about Pragmatics with Neural Listeners and Speakers , 2016, EMNLP.

[14]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[15]  Roger Levy,et al.  What Syntactic Structures block Dependencies in RNN Language Models? , 2019, CogSci.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Rachel Rudinger,et al.  Neural Models of Factuality , 2018, NAACL.

[18]  Noah D. Goodman,et al.  Knowledge and implicature: Modeling language understanding as social cognition , 2012, CogSci.

[19]  S. A. Chowdhury,et al.  RNN Simulations of Grammaticality Judgments on Long-distance Dependencies , 2018, COLING.

[20]  G. Milsark Existential sentences in English , 1979 .

[21]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[22]  Leon Bergen,et al.  Speaker knowledge influences the comprehension of pragmatic inferences. , 2012, Journal of experimental psychology. Learning, memory, and cognition.

[23]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[24]  Allyson Ettinger,et al.  Assessing Composition in Sentence Vector Representations , 2018, COLING.

[25]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[29]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[30]  Noah D. Goodman,et al.  Wonky worlds: Listeners revise world knowledge when utterances are odd , 2015, CogSci.

[31]  J. Barwise,et al.  Generalized quantifiers and natural language , 1981 .

[32]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[33]  Bart Geurts,et al.  Scalar Diversity , 2016, J. Semant..

[34]  Manaal Faruqui,et al.  Attention Interpretability Across NLP Tasks , 2019, ArXiv.

[35]  Meredith Larson,et al.  A novel experimental paradigm for distinguishing between what is said and what is implicated , 2012 .

[36]  Adina Williams,et al.  Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition , 2020, ACL.

[37]  Jun-Seok Kim,et al.  Interactive Visualization and Manipulation of Attention-based Neural Machine Translation , 2017, EMNLP.

[38]  Roger Levy,et al.  Neural language models as psycholinguistic subjects: Representations of syntactic state , 2019, NAACL.

[39]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[40]  Michael C. Frank,et al.  Review Pragmatic Language Interpretation as Probabilistic Inference , 2022 .

[41]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[42]  Michael Franke,et al.  Probabilistic pragmatics, or why Bayes’ rule is probably important for pragmatics , 2016 .

[43]  Gerald Gazdar,et al.  Pragmatics: Implicature, Presupposition, and Logical Form , 1978 .

[44]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[45]  Shikha Bordia,et al.  Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.

[46]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[47]  Arjen Zondervan,et al.  Scalar implicatures or focus: an experimental approach , 2010 .

[48]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[49]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[50]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[51]  Paul-Christian Bürkner,et al.  brms: An R Package for Bayesian Multilevel Models Using Stan , 2017 .

[52]  S. Levinson Presumptive Meanings: The theory of generalized conversational implicature , 2001 .

[53]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[54]  H. H. Clark Arenas of language use , 1993 .

[55]  Marie-Catherine de Marneffe,et al.  Evaluating BERT for natural language inference: A case study on the CommitmentBank , 2019, EMNLP.

[56]  Siobhan Chapman Logic and Conversation , 2005 .

[57]  Veerle van Geenhoven Rede ning the weak/strong distinction , 1998 .