Selection and information: a class-based approach to lexical relationships

Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement "The number two is blue" may be syntactically well formed, but at some level it is anomalous-- scBLUE is not a predicate that can be applied to numbers. In this dissertation, I propose a new, information-theoretic account of selectional constraints. Unlike previous approaches, this proposal requires neither the identification of primitive semantic features nor the formalization of complex inferences based on world knowledge. The proposed model assumes instead that lexical items are organized in a conceptual taxonomy according to class membership, where classes are defined simply as sets--that is, extensionally, rather than in terms of explicit features or properties. Selection is formalized in terms of a probabilistic relationship between predicates and concepts: the selectional behavior of a predicate is modeled as its distributional effect on the conceptual classes of its arguments, expressed using the information-theoretic measure of relative entropy. The use of relative entropy leads to an illuminating interpretation of what selectional constraints are: the strength of a predicate's selection for an argument is identified with the quantity of information it carries about that argument. In addition to arguing that the model is empirically adequate, I explore its application to two problems. The first concerns a linguistic question: why some transitive verbs permit implicit direct objects ("John ate $\emptyset$") and others do not ("*John brought $\emptyset$"). It has often been observed informally that the omission of objects is connected to the ease with which the object can be inferred. I have made this observation more formal by positing a relationship between inferability and selectional constraints, and have confirmed the connection between selectional constraints and implicit objects in a set of computational experiments. Second, I have explored the practical applications of the model in resolving syntactic ambiguity. A number of authors have recently begun investigating the use of corpus-based lexical statistics in automatic parsing; the results of computational experiments using the present model suggest that often lexical relationships are better viewed in terms of underlying conceptual relationships such as selectional preference and concept similarity. Thus the information-theoretic measures proposed here can serve not only as components in a theory of selectional constraints, but also as tools for practical natural language processing.

[1]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[2]  Aleksandr Yakovlevich Khinchin,et al.  Mathematical foundations of information theory , 1959 .

[3]  J. Fodor,et al.  The structure of a semantic theory , 1963 .

[4]  Roy C. Milton,et al.  An Extended Table of Critical Values for the Mann-Whitney (Wilcoxon) Two-Sample Statistic , 1964 .

[5]  A. Treisman Verbal responses and contextual constraints in language , 1965 .

[6]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[7]  Theodore Drange,et al.  Type Crossings: Sentential Meaninglessness in the Border Area of Linguistics and Philosophy , 1966 .

[8]  A. G. Oettinger,et al.  Language and information , 1968 .

[9]  B. V. Fraassen Presupposition, Implication, and Self-Reference , 1968 .

[10]  Adrienne Lehrer,et al.  Verbs and deletable objects , 1970 .

[11]  J. Kimball Seven principles of surface structure parsing in natural language , 1973 .

[12]  K. Nelson,et al.  Structure and strategy in learning to talk. , 1973 .

[13]  F. Heny,et al.  An Introduction to the Principles of Transformational Syntax , 1975 .

[14]  Yorick Wilks,et al.  An intelligent analyzer and understander of English , 1975, Commun. ACM.

[15]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[16]  Mitchell P. Marcus,et al.  A theory of syntactic recognition for natural language , 1979 .

[17]  James Waldo 9 A PTQ Semantics for Sortal Incorrectness , 1979, Linguistics, Philosophy, and Montague Grammar.

[18]  Lyn Frazier,et al.  ON COMPREHENDING SENTENCES: SYNTACTIC PARSING STRATEGIES. , 1979 .

[19]  S. Thompson,et al.  Transitivity in Grammar and Discourse , 1980 .

[20]  David R. Dowty,et al.  Introduction to Montague semantics , 1980 .

[21]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[22]  Roger C. Schank,et al.  Language and Memory , 1986, Cogn. Sci..

[23]  Janet D. Fodor Semantics: Theories of Meaning in Generative Grammar , 1980 .

[24]  J. A. Fodor,et al.  Against definitions , 1980, Cognition.

[25]  Edward E. Smith,et al.  On the adequacy of prototype theory as a theory of concepts , 1981, Cognition.

[26]  David Pesetsky,et al.  Paths and categories , 1982 .

[27]  Kenneth Ward Church,et al.  Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table , 1982, CL.

[28]  P. Johnson-Laird,et al.  Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness , 1985 .

[29]  Sharon Lee Armstrong,et al.  What some concepts might not be , 1983, Cognition.

[30]  Paul A. Herzberg,et al.  Principles of Statistics , 1983 .

[31]  Ray Jackendoff Semantics and Cognition , 1983 .

[32]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Mark Steedman,et al.  On not being led up the garden path : The use of context by the psychological syntax processor , 1985 .

[34]  R. Burchfield Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .

[35]  C Snow,et al.  Child language data exchange system , 1984, Journal of Child Language.

[36]  Yorick Wilks,et al.  Syntax, Preference, and Right Attachment , 1985, IJCAI.

[37]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[38]  R M Weischedel A new semantic computation while parsing: presupposition and entailment , 1986 .

[39]  Kathleen Dahlgren,et al.  Using Commonsense Knowledge to Disambiguate Prepositional Phrase Modifiers , 1986, AAAI.

[40]  L. Rizzi Null objects in Italian and the theory of 'pro' , 1986 .

[41]  A. Woods,et al.  Statistics in Language Studies , 1986 .

[42]  Karen Sparck Jones Synonymy and semantic classification , 1986 .

[43]  John F. Kihlstrom,et al.  Colorless green ideas sleep furiously. , 1986 .

[44]  Douglas B. Lenat,et al.  CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition Bottlenecks , 1986, AI Mag..

[45]  Charles J. Fillmore,et al.  Pragmatically Controlled Zero Anaphora , 1986 .

[46]  G. Dell,et al.  Adapting production to comprehension: The explicit mention of instruments , 1987, Cognitive Psychology.

[47]  Michael K. Tanenhaus,et al.  Thematic roles and language comprehension , 1988 .

[48]  Karen Jensen,et al.  Disambiguating Prepositional Phrase Attachments by Using On-Line Dictionary Definitions , 1987, Comput. Linguistics.

[49]  T. Roeper Implicit arguments and the head-complement relation , 1987 .

[50]  Hiyan Alshawi,et al.  Processing Dictionary Definitions with Phrasal Pattern Hierarchies , 1987, CL.

[51]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[52]  Sergei Nirenburg,et al.  The Subworld Concept Lexicon and the Lexicon Management System , 1987, Comput. Linguistics.

[53]  John Sinclair,et al.  Collins COBUILD English Language Dictionary , 1987 .

[54]  Mark Steedman,et al.  Interaction with context during human sentence processing , 1988, Cognition.

[55]  P. Smolensky On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[56]  H. Gleitman,et al.  Linguistics: The Cambridge Survey: Where learning begins: initial representations for language learning , 1988 .

[57]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[58]  Sally Rice,et al.  Unlikely Lexical Entries , 1988 .

[59]  Francis T. Durso,et al.  Network Structures in Proximity Data , 1989 .

[60]  Kenneth Ward Church,et al.  Enhanced Good-Turing and Cat-Cal: Two New Methods for Estimating Probabilities of English Bigrams (abbreviated version) , 1989, HLT.

[61]  P. Jusczyk,et al.  A moment of silence: How the prosodic cues in motherese might assist language learning , 1986 .

[62]  D. K. Nelson Developmental Trends in Infants' Sensitivity to Prosodic Cues Correlated with Linguistic Units. , 1989 .

[63]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[64]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[65]  Ralph Weischedel,et al.  A Guide to IRUS-II Application Development , 1989 .

[66]  M K Tanenhaus,et al.  Lexical projection and the interaction of syntax and semantics in parsing , 1989, Journal of psycholinguistic research.

[67]  V. M. Holmes,et al.  Lexical Expectations in Parsing Complement-Verb Sentences , 1989 .

[68]  Frank A. Smadja,et al.  Microcoding the Lexicon with Co-occurrence Knowledge , 1989 .

[69]  Eric Brill,et al.  Deducing Linguistic Structure from the Statistics of Large Corpora , 1990, HLT.

[70]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[71]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[72]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[73]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[74]  George A. Miller,et al.  Nouns in WordNet: A Lexical Inheritance System , 1990 .

[75]  Robert B. Allen,et al.  Connectionist Language Users , 1990 .

[76]  Hans Brunner,et al.  Empirical Study of Predictive Powers of Simple Attachment Schemes for Post-modifier Prepositional Phrases , 1990, ACL.

[77]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[78]  C. Guarneri Cornell University Press , 1991 .

[79]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[80]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[81]  Alexis Kalokerinos A natural history of negation , 1991 .

[82]  David R. Dowty Thematic proto-roles and argument selection , 1991 .

[83]  R Plomp,et al.  The effect of linguistic entropy on speech perception in noise in young and elderly listeners. , 1991, The Journal of the Acoustical Society of America.

[84]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[85]  Ralph M. Weischedel,et al.  Partial Parsing: A Report on Work in Progress , 1991, HLT.

[86]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[87]  Eric Brill,et al.  Discovering the Lexical Features of a Language , 1991, ACL.

[88]  J. Elman Representation and structure in connectionist models , 1991 .

[89]  Michael K. Tanenhaus,et al.  Combinatory lexical information and language comprehension , 1991 .

[90]  Paola Velardi,et al.  How to Encode Semantic Knowledge: A Method for Meaning Representation and Computer-Aided Acquisition , 1991, CL.

[91]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[92]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[93]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[94]  Cynthia Fisher,et al.  On the semantic content of subcategorization frames , 1991, Cognitive Psychology.

[95]  Rajeev Agarwal,et al.  Disambiguation of Prepositional Phrases in Automatically Labelled Technical Text , 1991, AAAI.

[96]  M. R. Manzini Learnability and Cognition , 1991 .

[97]  Philip Resnik,et al.  Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing , 1992, COLING.

[98]  Makoto Nagao,et al.  Dynamic Programming Method for Analyzing Conjunctive Structures in Japanese , 1992, COLING.

[99]  Roberto Basili,et al.  Computational Lexicons: the Neat Examples and the Odd Exemplars , 1992, ANLP.

[100]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[101]  Géraldine Legendre,et al.  Principles for an Integrated Connectionist/Symbolic Theory of Higher Cognition ; CU-CS-600-92 , 1992 .

[102]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[103]  Patrick Paroubek,et al.  XTAG - A Graphical Workbench for Developing Tree-Adjoining Grammars , 1992, ANLP.

[104]  Philip Resnik,et al.  WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery , 1992, AAAI 1992.

[105]  Roberto Basili,et al.  Combining NLP and statistical techniques for lexical acquisition , 1992 .

[106]  Yorick Wilks,et al.  The preference semantics family , 1992 .

[107]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[108]  Jeremy J. Carroll,et al.  Linguistic Knowledge Generator , 1992, COLING.

[109]  Philip Resnik A Class-Based Approach to Lexical Discovery , 1992, ACL.

[110]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[111]  Padhraic Smyth,et al.  An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[112]  S. Kapur Computational Learning of Languages , 1992 .

[113]  M. Rispoli Discourse and the acquisition of eat , 1992, Journal of Child Language.

[114]  A. Woodward,et al.  Perception of acoustic correlates of major phrasal units by young infants , 1992, Cognitive Psychology.

[115]  Ralph Grishman,et al.  Acquisition of Selectional Patterns , 1992, COLING.

[116]  Keh-Yih Su,et al.  GPSM: A Generalized Probabilistic Semantic Model for Ambiguity Resolution , 1992, ACL.

[117]  James G. Schmolze,et al.  The KL-ONE family , 1992 .

[118]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[119]  Gregory Grefenstette,et al.  Finding Semantic Similarity in Raw Text: the Deese Antonyms , 1992 .

[120]  Volker Steinbiss,et al.  Cooccurrence smoothing for stochastic language modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[121]  Jeffrey Mark Siskind,et al.  Naive physics, event perception, lexical semantics, and language acquisition , 1992 .

[122]  Kevin Knight,et al.  Building a Large Ontology for Machine Translation , 1993, HLT.

[123]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[124]  Vasileios Hatzivassiloglou,et al.  Augmenting Lexicons Automatically: Clustering Semantically Related Adjectives , 1993, HLT.

[125]  Ralph Grishman,et al.  Smoothing of Automatically Generated Selectional Constraints , 1993, HLT.

[126]  Jeffrey Mark Siskind Lexical Acquisition as Constraint Satisfaction , 1993 .

[127]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[128]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[129]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[130]  Hinrich Schütze,et al.  Part-of-Speech Induction From Scratch , 1993, ACL.

[131]  Kenneth Ward Church,et al.  Introduction to the Special Issue on Computational Linguistics Using Large Corpora , 1993, Comput. Linguistics.

[132]  Philip Resnik,et al.  Semantic Classes and Syntactic Ambiguity , 1993, HLT.

[133]  M. MacDonald The interaction of lexical and syntactic ambiguity , 1993 .

[134]  益子 真由美 Argument Structure , 1993, The Lexicon.

[135]  John Charles Trueswell,et al.  The use of verb-based subcategorization and thematic role information in sentence processing , 1993 .

[136]  SchwartzRichard,et al.  Coping with ambiguity and unknown words through probabilistic models , 1993 .

[137]  Maryellen C. MacDonald,et al.  Probabilistic constraints and syntactic ambiguity resolution , 1994 .

[138]  Alon Itai,et al.  Word Sense Disambiguation Using a Second Language Monolingual Corpus , 1994, CL.

[139]  E. Markman,et al.  When it is better to receive than to give: Syntactic and conceptual constraints on vocabulary growth , 1994 .

[140]  Michael K. Tanenhaus,et al.  Semantic effects on syntactic ambiguity resolution: Evidence for a constraint-based resolution process. , 1994 .

[141]  S Pinker,et al.  Weird past tense forms , 1995, Journal of Child Language.