Probabilistic models of word order and syntactic discontinuity

This thesis takes up the problem of syntactic comprehension, or parsing—how an agent (human or machine) with knowledge of a specific language goes about inferring the hierarchical structural relationships underlying a surface string in the language. I take the position that probabilistic models of combining evidential information are cognitively plausible and practically useful for syntactic comprehension. In particular, the thesis applies probabilistic methods in investigating the relationship between word order and psycholinguistic models of comprehension; and in the practical problems of accuracy and efficiency in parsing sentences with syntactic discontinuity. On the psychological side, the thesis proposes a theory of expectation-based processing difficulty as a consequence of probabilistic syntactic disambiguation: the ease of processing a word during comprehension is determined primarily by the degree to which that word is expected. I identify a class of syntactic phenomena, associated primarily with verb-final clause order, where the predictions of expectation-based processing diverge most sharply from more established locality-based theories of processing difficulty. Using existing probabilistic parsing algorithms and syntactically annotated data sources, I show that the expectation-based theory matches a range of established experimental psycholinguistic results better than locality-based theories. The comparison of probabilistic- and locality-driven processing theories is a crucial area of psycholinguistic research due to its implications for the relationship between linguistic production and comprehension, and more generally for theories of modularity in cognitive science. The thesis also takes up the problem of probabilistic models for discontinuous constituency, when phrases do not consist of continuous substrings of a sentence. Discontinuity poses a computational challenge in parsing, because it expands the set of possible substructures in a sentence beyond the bound, quadratic in sentence length, on the set of possible continuous constituents. For discontinuous constituency, I investigate the problem of accuracy employing discriminative classifiers organized on principles of syntactic theory and used to introduce discontinuous relationships into otherwise strictly context-free phrase structure trees; and the problem of efficiency in joint inference over both continuous and discontinuous structures, using probabilistic instantiations of mildly context-sensitive grammatical formalisms and factorizing grammatical generalizations into probabilistic components of dominance and linear order.

[1]  Marc Brysbaert,et al.  Relative clause attachment in Dutch: On-line comprehension corresponds to corpus frequencies when lexical variables are taken into account , 2006 .

[2]  Edward Gibson,et al.  Reading relative clauses in English , 2005 .

[3]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[4]  Tibor Kiss,et al.  Semantic Constraints on Relative Clause Extraposition , 2005 .

[5]  Stephan Oepen,et al.  Stochastic HPSG Parse Disambiguation using the Redwoods Corpus , 2005 .

[6]  Roger Levy,et al.  Deep Dependencies from Context-Free Statistical Parsers: Correcting the Surface Dependency Approximation , 2004, ACL.

[7]  Valentin Jijkoun,et al.  Enriching the Output of a Parser Using Memory-based Learning , 2004, ACL.

[8]  Andy Way,et al.  Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations , 2004, ACL.

[9]  Bob Carpenter,et al.  Head-Driven Parsing for Word Lattices , 2004, ACL.

[10]  Mark Johnson,et al.  Attention Shifting for Parsing Speech , 2004, ACL.

[11]  Richard Campbell,et al.  Using Linguistic Principles to Recover Empty Categories , 2004, ACL.

[12]  P. Gordon,et al.  Effects of noun phrase type on sentence complexity , 2004 .

[13]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[14]  E. Gibson,et al.  Disambiguation preferences and corpus frequencies in noun phrase conjunction , 2003 .

[15]  Amit Dubey,et al.  Antecedent Recovery: Experiments with a Trace Tagger , 2003, EMNLP.

[16]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[17]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[18]  R. Shillcock,et al.  Low-level predictive inference in reading: the influence of transitional probabilities on eye movements , 2003, Vision Research.

[19]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[20]  Shravan Vasishth,et al.  Working Memory in Sentence Comprehension: Processing Hindi Center Embeddings , 2003 .

[21]  John Hale,et al.  The Information Conveyed by Words in Sentences , 2003, Journal of psycholinguistic research.

[22]  M. Nederhof Squibs and Discussions: Weighted Deductive Parsing and Knuth’s Algorithm , 2003, CL.

[23]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[24]  Tessa C. Warren,et al.  The influence of referential processing on sentence complexity , 2002, Cognition.

[25]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[26]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[27]  Mark Johnson,et al.  A Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents , 2002, ACL.

[28]  M. Brysbaert,et al.  The correspondence between sentence production and corpus frequencies in modifier attachment , 2002, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[29]  Walt Detmar Meurers On Partial Constituent Fronting in German , 2001 .

[30]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[31]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[32]  Mark Johnson,et al.  Joint and Conditional Estimation of Tagging and Parsing Models , 2001, ACL.

[33]  Daniel Jurafsky,et al.  A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing , 2001, NIPS.

[34]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[35]  L Konieczny,et al.  Locality and Parsing Complexity , 2000, Journal of psycholinguistic research.

[36]  M W Crocker,et al.  Wide-Coverage Probabilistic Sentence Processing , 2000, Journal of psycholinguistic research.

[37]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[38]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[39]  Philip H. Miller,et al.  Strong generative capacity - the semantics of linguistic formalism , 2000, CSLI lecture notes series.

[40]  G. Altmann,et al.  Incremental interpretation at verbs: restricting the domain of subsequent reference , 1999, Cognition.

[41]  Brian Roark,et al.  Efficient probabilistic top-down and left-corner parsing , 1999, ACL.

[42]  Carson T. Schütze,et al.  Disambiguation Preferences in Noun Phrase Conjunction Do Not Mirror Corpus Frequency , 1999 .

[43]  Stefan Müller,et al.  Deutsche Syntax deklarativ: Head-Driven Phrase Structure Grammar für das Deutsche , 1999 .

[44]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[45]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[46]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[47]  Zhiyi Chi,et al.  Estimation of Probabilistic Context-Free Grammars , 1998, Comput. Linguistics.

[48]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.

[49]  M. Tanenhaus,et al.  Modeling the Influence of Thematic Fit (and Other Constraints) in On-line Sentence Comprehension , 1998 .

[50]  Manuel Carreiras,et al.  Language processing in Spanish , 1997 .

[51]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[52]  T. Brants,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[53]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[54]  Gary E. Raney,et al.  Eye movement control in reading: a comparison of two types of models. , 1996, Journal of experimental psychology. Human perception and performance.

[55]  Edward P. Stabler,et al.  Derivational Minimalism , 1996, LACL.

[56]  Daniel Jurafsky,et al.  A Probabilistic Model of Lexical and Syntactic Access and Disambiguation , 1996, Cogn. Sci..

[57]  G. Hickok,et al.  Recency preference in the human sentence processing mechanism , 1996, Cognition.

[58]  Marc Brysbaert,et al.  Exposure-based models of human parsing: Evidence for the use of coarse-grained (nonlexical) statistical records , 1995 .

[59]  Andreas Kathol,et al.  Extraposition via Complex Domain Formation , 1995, ACL.

[60]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[61]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  John A. Hawkins,et al.  A Performance Theory of Order and Constituency , 1995 .

[63]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[64]  Maryellen C. MacDonald,et al.  The lexical nature of syntactic ambiguity resolution , 1994 .

[65]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[66]  Stuart M. Shieber,et al.  Principles and Implementation of Deductive Parsing , 1994, J. Log. Program..

[67]  Ronald M. Kaplan,et al.  The Interface between Phrasal and Functional Constraints , 1993, CL.

[68]  M. MacDonald The interaction of lexical and syntactic ambiguity , 1993 .

[69]  Tadao Kasami,et al.  On Multiple Context-Free Grammars , 1991, Theor. Comput. Sci..

[70]  John D. Lafferty,et al.  Computation of the Probability of Initial Substring Generation by Stochastic Context-Free Grammars , 1991, Comput. Linguistics.

[71]  Liliane Haegeman,et al.  Introduction to Government and Binding Theory , 1991 .

[72]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[73]  F. Cuetos,et al.  Cross-linguistic differences in parsing: Restrictions on the use of the Late Closure strategy in Spanish , 1988, Cognition.

[74]  David J. Weir,et al.  Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[75]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[76]  Christopher Culy,et al.  The complexity of the vocabulary of Bambara , 1985 .

[77]  Mark Johnson,et al.  Parsing with Discontinuous Constituents , 1985, ACL.

[78]  Clare Beaumont,et al.  Reading Relative Clauses , 1982 .

[79]  M A Just,et al.  A theory of reading: from eye fixations to comprehension. , 1980, Psychological review.

[80]  Janet D. Fodor,et al.  The sausage machine: A new two-stage parsing model , 1978, Cognition.

[81]  Donald E. Knuth,et al.  A Generalization of Dijkstra's Algorithm , 1977, Inf. Process. Lett..

[82]  W D Marslen-Wilson,et al.  Sentence Perception as an Interactive Parallel Process , 1975, Science.

[83]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[84]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[85]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[86]  Noam Chomsky,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[87]  H. B. Allen,et al.  A Functional Grammar , 1946 .

[88]  Dan Jurafsky,et al.  Probabilistic Modeling in Psycholinguistics: Linguistic Comprehension and Production , 2006 .

[89]  Amit Dubey,et al.  Statistical parsing for German: modeling syntactic properties and annotation differences , 2005 .

[90]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[91]  Stefan Riezler,et al.  Speed and Accuracy in Shallow and Deep Stochastic Parsing , 2004, NAACL.

[92]  Frank Keller,et al.  The Entropy Rate Principle as a Predictor of Processing Effort: An Evaluation against Eye-tracking Data , 2004, EMNLP.

[93]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[94]  John Hale,et al.  Grammar, uncertainty and sentence processing , 2004 .

[95]  Edward Gibson,et al.  Verbal Working Memory in Sentence Comprehension , 2004 .

[96]  Maryellen C. MacDonald,et al.  The use of "that" in the Production and Comprehension of Object Relative Clauses , 2003 .

[97]  Ann Bies,et al.  Bracketing Guidelines for Treebank II Style , 2002 .

[98]  Douglas Roland,et al.  Verb Sense and Verb Subcategorization Probabilities , 2001 .

[99]  Edward P. Stabler,et al.  Parsing minimalist languages , 2001 .

[100]  Michelle L. Gregory,et al.  Linguistic informativeness and speech production : an investigation of contextual and discourse-pragmatic effects on phonological variation , 2001 .

[101]  Matthias Schlesewsky,et al.  The Subject Preference in the Processing of Locally Ambiguous WH-Questions in German , 2000 .

[102]  Oliver Plaehn,et al.  Computing the Most Probable Parse for a Discontinuous Phrase Structure Grammar , 2000, IWPT.

[103]  E. Gibson The dependency locality theory: A distance-based theory of linguistic complexity. , 2000 .

[104]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[105]  David Chiang Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[106]  Marc Brysbaert,et al.  Challenges to recent theories of cross-linguistic differences in parsing: Evidence from Dutch. In D. Hillert (Ed.), Sentence processing: A cross-linguistic perspective. CA: Academic Press, San Diego, 1998. , 1998 .

[107]  Brigitte Krenn,et al.  Aspekte der Relativsatzextraposition im Deutschen , 1998 .

[108]  Srini Narayanan,et al.  Bayesian Models of Human Sentence Processing , 1998 .

[109]  Lars Konieczny,et al.  Human sentence processing: a semantics-oriented parsing approach , 1996 .

[110]  Hubert Haider,et al.  Downright Down to the Right , 1996 .

[111]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[112]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[113]  Barbara Hemforth,et al.  Kognitives Parsing: Repräsentation und Verarbeitung sprachlichen Wissens , 1993, DISKI.

[114]  C. Clifton,et al.  Comprehending Sentences with Long-Distance Dependencies , 1989 .

[115]  David J. Weir,et al.  Characterizing mildly context-sensitive grammar formalisms , 1988 .

[116]  Gerald Gazdar,et al.  Applicability of Indexed Grammars to Natural Languages , 1988 .

[117]  Lyn Frazier,et al.  Sentence processing: A tutorial review. , 1987 .

[118]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[119]  Carl Jesse Pollard,et al.  Generalized phrase structure grammars, head grammars, and natural language , 1984 .

[120]  S. Della,et al.  A Maximum Entropy Approach to Natural Language Processing , 1983 .

[121]  Lyn Frazier,et al.  ON COMPREHENDING SENTENCES: SYNTACTIC PARSING STRATEGIES. , 1979 .

[122]  John Robert Ross,et al.  Constraints on variables in syntax , 1967 .

[123]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[124]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .