Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge

The question of whether humans represent grammatical knowledge as a binary condition on membership in a set of well-formed sentences, or as a probabilistic property has been the subject of debate among linguists, psychologists, and cognitive scientists for many decades. Acceptability judgments present a serious problem for both classical binary and probabilistic theories of grammaticality. These judgements are gradient in nature, and so cannot be directly accommodated in a binary formal grammar. However, it is also not possible to simply reduce acceptability to probability. The acceptability of a sentence is not the same as the likelihood of its occurrence, which is, in part, determined by factors like sentence length and lexical frequency. In this paper, we present the results of a set of large-scale experiments using crowd-sourced acceptability judgments that demonstrate gradience to be a pervasive feature in acceptability judgments. We then show how one can predict acceptability judgments on the basis of probability by augmenting probabilistic language models with an acceptability measure. This is a function that normalizes probability values to eliminate the confounding factors of length and lexical frequency. We describe a sequence of modeling experiments with unsupervised language models drawn from state-of-the-art machine learning methods in natural language processing. Several of these models achieve very encouraging levels of accuracy in the acceptability prediction task, as measured by the correlation between the acceptability measure scores and mean human acceptability values. We consider the relevance of these results to the debate on the nature of grammatical competence, and we argue that they support the view that linguistic knowledge can be intrinsically probabilistic.

[1]  P. Lieberman Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech , 1963 .

[2]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[3]  Noam Chomsky,et al.  The Logical Structure of Linguistic Theory , 1975 .

[4]  D. Swinney Lexical access during sentence comprehension: (Re)consideration of context effects , 1979 .

[5]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[6]  Sharon Lee Armstrong,et al.  What some concepts might not be , 1983, Cognition.

[7]  J. Fodor The Modularity of mind. An essay on faculty psychology , 1986 .

[8]  Eric Atwell,et al.  How to Detect Grammatical Errors in a Text Without Parsing It , 1987, EACL.

[9]  H. Nagata Anchoring Effects in Judging Grammaticality of Sentences , 1992 .

[10]  Noam Chomsky,et al.  The Minimalist Program , 1992 .

[11]  Jeremy H. Clear,et al.  The British national corpus , 1993 .

[12]  Wayne Cowart,et al.  Experimental Syntax: Applying Objective Methods to Sentence Judgments , 1997 .

[13]  J. Elman Generalization , simple recurrent networks , and the emergence of structure , 1998 .

[14]  Martin Chodorow,et al.  An Unsupervised Method for Detecting Grammatical Errors , 2000, ANLP.

[15]  Jeroen van de Weijer,et al.  Optimality theory : phonology, syntax, and acquisition , 2000 .

[16]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[17]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[18]  Frank Keller,et al.  Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality , 2001 .

[19]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[20]  Johnny Bigert Robust Error Detection: A Hybrid Approach Combining Unsupervised Error Detection and Linguistic Knowledge , 2002 .

[21]  Noam Chomsky,et al.  The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.

[22]  A. Friederici,et al.  On the lateralization of emotional prosody: An event-related functional MR investigation , 2003, Brain and Language.

[23]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[24]  David Adger,et al.  Core Syntax: A Minimalist Approach , 2003 .

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[27]  Evelien Keizer,et al.  Fuzzy grammar: A reader , 2004 .

[28]  P. Smolensky,et al.  Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[29]  L. Barsalou,et al.  Structural facilitation: Mere exposure effects for grammatical acceptability as evidence for syntactic priming in comprehension , 2005 .

[30]  Harold L. Somers,et al.  Round-trip Translation: What Is It Good For? , 2005, ALTA.

[31]  Jonas Sjöbergh Chunking: an unsupervised method to find errors in text , 2005, NODALIDA.

[32]  Antonella Sorace,et al.  Gradience in Linguistic Data , 2005 .

[33]  Frank Keller,et al.  Probabilistic Grammars as Models of Gradience in Language Processing , 2006 .

[34]  M. Schlesewsky,et al.  Gradience in grammar : generative perspectives , 2006 .

[35]  J. Tenenbaum,et al.  Special issue on “Probabilistic models of cognition , 2022 .

[36]  Christopher D. Manning,et al.  Probabilistic models of language processing and acquisition , 2006, Trends in Cognitive Sciences.

[37]  M. Schlesewsky,et al.  Gradience in Grammar , 2006 .

[38]  Josef van Genabith,et al.  A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors , 2007, EMNLP.

[39]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[40]  Jon Sprouse Continuous Acceptability, Categorical Grammaticality, and Experimental Syntax , 2007, Biolinguistics.

[41]  Ellen Woolford,et al.  Introduction to OT syntax , 2007 .

[42]  Judy B. Bernstein,et al.  Data and grammar: Means and individuals , 2007 .

[43]  Bas Aarts,et al.  Syntactic gradience : the nature of grammatical indeterminacy , 2007 .

[44]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[45]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[46]  R. Jacobs,et al.  Perception of speech reflects optimal use of probabilistic speech cues , 2008, Cognition.

[47]  I. Sag,et al.  Cognitive constraints and island effects , 2010, Language.

[48]  Carson T Schütze,et al.  Linguistic evidence and grammatical theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[49]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[50]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[51]  Steven T Piantadosi,et al.  Word lengths are optimized for efficient communication , 2011, Proceedings of the National Academy of Sciences.

[52]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[53]  Jon Sprouse A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory , 2010, Behavior research methods.

[54]  Shalom Lappin,et al.  Linguistic Nativism and the Poverty of the Stimulus , 2011 .

[55]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[56]  B. Ambridge,et al.  Semantics versus statistics in the retreat from locative overgeneralization errors , 2012, Cognition.

[57]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[58]  Dan Klein,et al.  Large-Scale Syntactic Language Modeling with Treelets , 2012, ACL.

[59]  Jon Sprouse,et al.  Assessing the reliability of textbook data in syntax: Adger's Core Syntax1 , 2012, Journal of Linguistics.

[60]  Edward Gibson,et al.  Quantitative methods in syntax/semantics research: A response to Sprouse and Almeida (2013) , 2013 .

[61]  Morten H. Christiansen,et al.  The need for quantitative methods in syntax and semantics research , 2013 .

[62]  Diogo Almeida,et al.  The empirical status of data in syntax: A reply to Gibson and Fedorenko , 2013 .

[63]  Alexander Clark,et al.  Towards a Statistical Model of Grammaticality , 2013, CogSci.

[64]  A. Clark,et al.  Statistical Representation of Grammaticality Judgements , 2013 .

[65]  C. F. Hockett A Manual of Phonology , 2013 .

[66]  Alexander Clark,et al.  Statistical Representation of Grammaticality Judgements: the Limits of N-Gram Models , 2013, CMCL.

[67]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[68]  Matt Post,et al.  The Language Demographics of Amazon Mechanical Turk , 2014, TACL.

[69]  Katherine E. Twomey,et al.  Preemption versus Entrenchment: Towards a Construction-General Solution to the Problem of the Retreat from Verb Argument Structure Overgeneralization , 2015, PloS one.

[70]  Alexander Clark,et al.  Measuring Gradience in Speakers' Grammaticality Judgements , 2014, CogSci.

[71]  Bob Carpenter,et al.  The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.

[72]  Nitin Madnani,et al.  Predicting Grammaticality on an Ordinal Scale , 2014, ACL.

[73]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[74]  Alexander Clark,et al.  Unsupervised Prediction of Acceptability Judgements , 2015, ACL.

[75]  Yvette Graham,et al.  Improving Evaluation of Machine Translation Quality Estimation , 2015, ACL.

[76]  Anna L. Theakston,et al.  The ubiquity of frequency effects in first language acquisition , 2015, Journal of Child Language.