The Tradition of Categoricity and Prospects for Stochasticity

“Everyone knows that language is variable.” This is the bald sentence with which Sapir (1921:147) begins his chapter on language as an historical product. He goes on to emphasize how two speakers’ usage is bound to differ “in choice of words, in sentence structure, in the relative frequency with which particular forms or combinations of words are used”. I should add that much sociolinguistic and historical linguistic research has shown that the same speaker’s usage is also variable (Labov 1966, Kroch 2001:722). However, the tradition of most syntacticians has been to ignore this thing that everyone knows.1 Human languages are the prototypical example of a symbolic system. From very early on, logics and logical reasoning were invented for handling natural language understanding. Logics and formal languages have a language-like form that draws from and meshes well with natural languages. It is not immediately obvious where the continuous and quantitative aspects of syntax are. The dominant answer in syntactic theory has been “nowhere” (Chomsky 1969:57; also 1956, 1957, etc.): “It must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term.” In the 1950s there were prospects for probabilistic methods taking hold in linguistics, in part due to the influence of the new field of Information Theory (Shannon 1948).2 Chomsky’s influential remarks had the effect of killing off interest in probabilistic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no linguistic reason), (ii) Probabilistic models don’t model grammaticality (neither Colorless green ideas sleep furiously nor Furiously sleep ideas green colorless have previously been uttered – and hence must be estimated to have probability zero, Chomsky wrongly assumes – but the former is grammatical while the latter is not, and (iii) Use of probabilities does not meet the goal of describing the mind-internal I-language as opposed to the observed-in-the-world E-language. This chapter is not meant to be a detailed critique of Chomsky’s arguments – Abney (1996) provides a survey and a rebuttal, and Pereira (2000) has further useful discussion – but some of these concerns are still important to discuss. I

[1]  W. Labov,et al.  Empirical foundations for a theory of language change , 2014 .

[2]  P. Smolensky,et al.  Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[3]  R. Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[4]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[5]  Natalie Sciarini-Gourianova,et al.  Beyond Grammar: An Experience-Based Theory of Language (review) , 2003 .

[6]  E. Ziegel Generalized Linear Models , 2002, Technometrics.

[7]  Bas Aarts,et al.  Exploring Natural Language: Working with the British Component of the International Corpus of English , 2002 .

[8]  Paola Merlo,et al.  Automatic distinction of arguments and modifiers: the case of prepositional phrases , 2001, CoNLL.

[9]  Edward Flemming Scalar and categorical phenomena in a unified model of phonetics and phonology , 2001, Phonology.

[10]  Jonas Kuhn,et al.  Formal and computational aspects of optimality-theoretic syntax , 2001 .

[11]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[12]  Frank Keller,et al.  Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality , 2001 .

[13]  P. Boersma,et al.  Empirical Tests of the Gradual Learning Algorithm , 2001, Linguistic Inquiry.

[14]  J. Bresnan Lexical-Functional Syntax , 2000 .

[15]  A. Sorace Gradients in Auxiliary Selection with Intransitive Verbs. , 2000 .

[16]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[17]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[18]  Fernando C Pereira Formal grammar and information theory: together again? , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[19]  Helge Lødrup,et al.  Linking and Optimality in the Norwegian Presentational Focus Construction , 1999, Nordic Journal of Linguistics.

[20]  Daniel A. Powers,et al.  Statistical Methods for Categorical Data Analysis , 1999 .

[21]  Judith Aissen,et al.  Markedness and Subject Choice in Optimality Theory , 1999 .

[22]  Maryellen C. MacDonald,et al.  A probabilistic constraints approach to language acquisition and processing , 1999, Cogn. Sci..

[23]  G. Müller Optimality, markedness, and word order in German , 1999 .

[24]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[25]  David E. Hapeman Statistical Analysis of Categorical Data , 1999, Technometrics.

[26]  E. Schneider Sociolinguistic Theory: Linguistic Variation and Its Social Significance , 1999 .

[27]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[28]  Joshua B. Tenenbaum,et al.  Bayesian Modeling of Human Concept Learning , 1998, NIPS.

[29]  Rens Bod,et al.  A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis , 1998, ACL.

[30]  Carson T. Schütze The empirical base of linguistics: Grammaticality judgments and linguistic methodology , 1998 .

[31]  Noam Chomsky The Minimalist Program , 1998, Journal of Linguistics.

[32]  Ad Neeleman,et al.  Conflict resolution in passive formation , 1998 .

[33]  Cornelia Maria Verspoor,et al.  Contextually-Dependent Lexical Semantics , 1997 .

[34]  A. Agresti An introduction to categorical data analysis , 1997 .

[35]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[36]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[37]  Thomas Wasow,et al.  Remarks on grammatical weight , 1997, Language Variation and Change.

[38]  Bill Reynolds,et al.  Optimality Theory and variable word-final deletion in Faetar , 1997, Language Variation and Change.

[39]  W. Cowart Experimental Syntax: Applying Objective Methods to Sentence Judgments , 1997 .

[40]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[41]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[42]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[43]  Geoffrey K. Pullum,et al.  Learnability, Hyperlearning, and the Poverty of the Stimulus , 1996 .

[44]  Thomas E. Hukari,et al.  Adjunct extraction , 1995, Journal of Linguistics.

[45]  Beth Levin,et al.  Building on a corpus: A linguistic and lexicographical look at some near-synonyms* , 1995 .

[46]  Richard A. Demers,et al.  Predicates and pronominal arguments in Straits Salish , 1994 .

[47]  Glyn Morrill,et al.  Type Logical Grammar: Categorial Logic of Signs , 1994 .

[48]  Walt Wolfram,et al.  Convergent explanation and alternative regularization patterns: Were/weren't leveling in a vernacular English variety , 1994, Language Variation and Change.

[49]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[50]  T. Givon The pragmatics of de-transitive voice: Functional and typological aspects of inversion , 1994 .

[51]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[52]  William D. Raymond,et al.  An Optimality-Theoretic Typology of Case and Grammatical Voice Systems , 1993 .

[53]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[54]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[55]  益子 真由美 On Argument Structure , 1993 .

[56]  Joan Maling,et al.  Of Nominative and Accusative: The Hierarchical Assignment of Grammatical Case in Finnish , 1993 .

[57]  Paul Kroeger,et al.  Phrase Structure and Grammatical Relations in Tagalog , 1992 .

[58]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[59]  Yen-hui Audrey Li,et al.  Order and Constituency in Mandarin Chinese , 1990 .

[60]  Ted Briscoe,et al.  The Syntactic Regularity of English Noun Phrases , 1989, EACL.

[61]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[62]  J. Ney What was transformational grammar , 1987 .

[63]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[64]  Evidence against the “Grammatical”/“Ungrammatical” Distinction , 1987, Corpus Linguistics and Beyond.

[65]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[66]  Jan Svartvik On voice in the English verb , 1985 .

[67]  Jung­Il Suh,et al.  On the Variable Rules , 1983 .

[68]  Eloise Jelinek,et al.  The Agent Hierarchy and Voice in Some Coast Salish Languages , 1983, International Journal of American Linguistics.

[69]  W. Labov,et al.  Constraints on the agentless passive , 1983, Journal of Linguistics.

[70]  Ivan A. Sag,et al.  On parasitic gaps , 1983 .

[71]  Jenny Cheshire Variation in the use of ain't in an urban British English dialect , 1981, Language in Society.

[72]  T. Bever,et al.  The Non-Uniqueness of Linguistic Intuitions. , 1981 .

[73]  Charles James Nice Bailey,et al.  Variation and Linguistic Theory , 1981 .

[74]  Stephen E. Fienberg,et al.  The analysis of cross-classified categorical data , 1980 .

[75]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[76]  S. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[77]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[78]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[79]  W. Labov Contraction, Deletion, and Inherent Variability of the English Copula. , 1969 .

[80]  S. Reder,et al.  Grammatical complexity and inference , 1969 .

[81]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[82]  Joseph H. Greenberg,et al.  Language Universals: With Special Reference to Feature Hierarchies , 1966 .

[83]  T. E. Harris,et al.  The Theory of Branching Processes. , 1963 .

[84]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[85]  Edward Sapir,et al.  Language: An Introduction to the Study of Speech , 1955 .

[86]  M. Joos Description of Language Design , 1950 .

[87]  C. Fairman,et al.  Plain Words: A Guide to the Use of English , 1949 .

[88]  H. Fowler,et al.  A Dictionary of Modern English Usage , 1926 .

[89]  Noam Chomsky,et al.  Quine's empirical assumptions , 2004, Synthese.

[90]  Steven Abney,et al.  Statistical Methods and Linguistics , 2002 .

[91]  F. Ramsey,et al.  The statistical sleuth : a course in methods of data analysis , 2002 .

[92]  Andrew Koontz-Garboden,et al.  A stochastic OT approach to word order variation in Korlai Portuguese , 2001 .

[93]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[94]  Shipra Dingare,et al.  The Effect of Feature Hierarchies on Frequencies of Passivization in English , 2001 .

[95]  Christopher D. Manning,et al.  Soft Constraints Mirror Hard Constraints : Voice and Person in English and Lummi , 2001 .

[96]  David Mumford,et al.  The Dawning of the Age of Stochasticity , 2000 .

[97]  Bas Aarts,et al.  Parsing in reverse — Exploring ICE-GB with Fuzzy Tree Fragments and ICECUP , 2000, Corpora Galore.

[98]  Michael Barlow,et al.  Usage-based models of language , 2000 .

[99]  Daniel Kersten,et al.  High-level Vision as Statistical Inference , 1999 .

[100]  Paul Boersma,et al.  Phonology-semantics interaction in OT, and its acquisition* , 1999 .

[101]  Daniel Jurafsky,et al.  How Verb Subcategorization Frequencies Are Affected By Corpus Choice , 1998, COLING.

[102]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[103]  Christopher Culy,et al.  Statistical Distribution and the Grammatical/Ungrammatical Distinction , 1998, Grammars.

[104]  Robert Malouf,et al.  Mixed categories in the hierarchical lexicon , 1998 .

[105]  P. Boersma How we learn variation, optionality and probalility , 1997 .

[106]  Judith L. Klavans,et al.  Book Reviews: The Balancing Act: Combining Symbolic and Statistical Approaches to Language , 1997, CL.

[107]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[108]  Stephen Wechsler,et al.  The domain of direct case assignment , 1996 .

[109]  J. Harkins Review of Labov, William (1994) Principles of linguistic change, Volume 1: Internal factors , 1996 .

[110]  Carson T. Schütze PP attachment and argumenthood , 1995 .

[111]  Yoshua,et al.  Pattern Recognition and Neural Networks , 1995 .

[112]  C. Snow,et al.  Input and interaction in language acquisition: The changing role of negative evidence in theories of language development , 1994 .

[113]  William Thomas Reynolds,et al.  Variation and phonological theory , 1994 .

[114]  A. Zaenen Unaccusativity in Dutch: Integrating Syntax and Lexical Semantics , 1993 .

[115]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[116]  Sten Vikner,et al.  Obligatory Adjuncts and the Structure of Events , 1993 .

[117]  J. Milroy,et al.  Real English: The Grammar of English Dialects in the British Isles , 1993 .

[118]  Kepa Korta Carrión,et al.  Formal semantics for natural language , 1993 .

[119]  Franziska Wulf Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[120]  R. Carpenter The Logic of Typed Feature Structures: Definite Clause Programming , 1992 .

[121]  Arne Olofsson,et al.  A participle caught in the act. On the prepositional use of following , 1990 .

[122]  Géraldine Legendre,et al.  Can Connectionism Contribute to Syntax? Harmonic Grammar, with an Application ; CU-CS-485-90 , 1990 .

[123]  Lioba J. Moshi,et al.  Object asymmetries in comparative Bantu syntax , 1990 .

[124]  Patrick M. Farrell,et al.  Grammatical relations : a cross-theoretical perspective , 1990 .

[125]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[126]  Dominique Estival,et al.  Formal and Functional aspects of the development from passive to ergative systems , 1988 .

[127]  George Fowler The syntax of the genitive case in Russian , 1987 .

[128]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[129]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[130]  Gregory R. Guy LINGUISTIC VARIATION IN BRAZILIAN PORTUGUESE: ASPECTS OF THE PHONOLOGY, SYNTAX, AND LANGUAGE HISTORY , 1981 .

[131]  Noam Chomsky,et al.  Lectures on Government and Binding , 1981 .

[132]  Noam Chomsky,et al.  Language and responsibility: Based on conversations with Mitsou Ronat , 1979 .

[133]  Frans Plank,et al.  Ergativity : towards a theory of grammatical relations , 1979 .

[134]  Heinz Vater,et al.  On the possibility of distinguighing between complements and adjuncts , 1978 .

[135]  Frank Parker,et al.  ON SYNTACTIC CHANGE , 1978 .

[136]  Noam Chomsky,et al.  The Logical Structure of Linguistic Theory , 1975 .

[137]  東京言語研究所,et al.  Three dimensions of linguistic theory , 1973 .

[138]  Charles James Nice Bailey,et al.  New ways of analyzing variation in English , 1973 .

[139]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[140]  Eugene Galanter,et al.  Handbook of mathematical psychology: I. , 1963 .

[141]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .