Empirical acquisition of conceptual distinctions via dictionary definitions

This thesis discusses the automatic acquisition of conceptual distinctions using empirical methods, with an emphasis on semantic relations. The goal is to improve semantic lexicons for computational linguistics, but the work can be applied to general-purpose knowledge bases as well. The approach is to analyze dictionary definitions to extract the distinguishing information (i.e., differentia) for concepts relative to their sibling concepts. A two-step process is employed to decouple the definition parsing from the disambiguation of the syntactic relations into the underlying semantic ones. Previous approaches tend to combine these steps through pattern matching geared to particular types of relations. In contrast, here a broad-coverage parser is first used to determine the syntactic relationships, and then statistical classification techniques are used to disambiguate the relationships into their underlying semantics. There are several contributions of this thesis. First, it introduces an empirical methodology for the extraction and disambiguation of semantic relations from dictionary definitions. Second, it introduces a statistical representation for these semantic relations using Bayesian networks, which are popular in artificial intelligence for representing probabilistic dependencies. Third, it shows how improvements in word-sense disambiguation can be achieved by augmenting a standard statistical classifier approach with a probabilistic spreading-activation system using the semantic information extracted using this process.

[1]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Donald Loritz,et al.  The analysis of noun sequences using semantic information extracted from on-line dictionaries , 1996 .

[4]  Brendan S. Gillon,et al.  The Lexical Semantics of English Count and Mass Nouns , 1999 .

[5]  Kenneth Ward Church,et al.  K-vec: A New Approach for Aligning Parallel Texts , 1994, COLING.

[6]  Wim Peters,et al.  Multilingual design of EuroWordNet , 1997, ACL 1997.

[7]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[8]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[9]  Janyce Wiebe,et al.  Selecting Decomposable Models for Word-Sense Disambiguation: TheGrling-Sdm System , 2000, Comput. Humanit..

[10]  George A. Miller,et al.  Annotating WordNet , 2004, FCP@NAACL-HLT.

[11]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[12]  John F. Sowa,et al.  Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[13]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[14]  Adam Kilgarriff,et al.  Introduction to the Special Issue on SENSEVAL , 2000, Comput. Humanit..

[15]  Nancy Ide,et al.  Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries , 1990, COLING.

[16]  Yorick Wilks,et al.  An intelligent analyzer and understander of English , 1975, Commun. ACM.

[17]  Janyce Wiebe,et al.  Probabilistic Event Categorization , 1997, ArXiv.

[18]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[19]  Marti A. Hearst Text tiling: A quantitative approach to discourse segmentation , 1993, ACL 1993.

[20]  Evelyne Viegas,et al.  Breadth and depth of semantic lexicons , 1999 .

[21]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[22]  Peter Wagner,et al.  An Interactive Dialogue System for Knowledge Acquisition in Cyc , 2003, IJCAI 2003.

[23]  Ted Pedersen,et al.  Knowledge Lean Word-Sense Disambiguation , 1997, AAAI/IAAI.

[24]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[25]  Adam Kilgarriff,et al.  Introduction to the special issue on evaluating word sense disambiguation systems , 2002, Natural Language Engineering.

[26]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[27]  Sergei Nirenburg,et al.  A Situated Ontology for Practical NLP , 1995 .

[28]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[29]  David Yarowsky,et al.  The Johns Hopkins SENSEVAL2 system descriptions , 2001 .

[30]  Martha Palmer,et al.  Semantic Processing for Finite Domains , 1990, CL.

[31]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[32]  Von-Wun Soo,et al.  An Empirical Study on Thematic Knowledge Acquisition Based on Syntactic Clues and Heuristics , 1993, ACL.

[33]  Stan Szpakowicz,et al.  Augmenting WordNet's Structure Using LDOCE , 2003, CICLing.

[34]  Philip Resnik,et al.  Disambiguating Noun Groupings with Respect to Wordnet Senses , 1995, VLC@ACL.

[35]  Lucy Vanderwende Ambiguity in the Acquisition of Lexical Information , 1995, ArXiv.

[36]  Kathleen Dahlgren,et al.  Using Commonsense Knowledge to Disambiguate Prepositional Phrase Modifiers , 1986, AAAI.

[37]  Indalecio Arturo Trujillo,et al.  Lexicalist Machine Translation of Spatial Prepositions , 1995 .

[38]  Patrick Saint-Dizier,et al.  Computational Lexical Semantics , 2005 .

[39]  Lucy Vanderwende,et al.  Algorithm for Automatic Interpretation of Noun Sequences , 1994, COLING.

[40]  Henri Béjoint,et al.  Tradition and innovation in modern English dictionaries , 1994 .

[41]  Graeme Hirst,et al.  Near-synonymy and the structure of lexical knowledge , 1995 .

[42]  Renata Vieira,et al.  An Empirically-based System for Processing Definite Descriptions , 2000, CL.

[43]  D. W. Barron Machine Translation , 1968, Nature.

[44]  Fritz Lehmann,et al.  Big Posets of Participatings and Thematic Roles , 1996, ICCS.

[45]  David Heckerman,et al.  Causal independence for probability assessment and inference using Bayesian networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[46]  C. Fillmore The case for case reopened , 1977 .

[47]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[48]  H. Alshawi,et al.  Analysing the dictionary definitions , 1989 .

[49]  Jason Eisner,et al.  Lexical Semantics , 2020, The Handbook of English Linguistics.

[50]  George A. Miller,et al.  The science of words , 1991 .

[51]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[52]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[53]  Alain Polguère,et al.  A Formal Lexicon in the Meaning-Text Theory (or How to Do Lexica with Words) , 1987, Comput. Linguistics.

[54]  Eduard Hovy,et al.  Combining and standardizing large- scale, practical ontologies for machine tranlation and other uses , 1998, LREC.

[55]  Sergei Nirenburg,et al.  Book Review: Ontological Semantics, by Sergei Nirenburg and Victor Raskin , 2004, CL.

[56]  Roberto Basili,et al.  An Empirical Symbolic Approach to Natural Language Processing , 1996, Artif. Intell..

[57]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[58]  Russell G. Almond Graphical belief modeling , 1995 .

[59]  John Lyons,et al.  语义学引论 = Linguistic Semantics , 2000 .

[60]  Janyce Wiebe,et al.  Decomposable Modeling in Natural Language Processing , 1999, CL.

[61]  Vasile Rus High precision logic form transformation , 2001, Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001.

[62]  Victor Raskin,et al.  The Real-World Linguist: Linguistic Applications in the 1980s , 1986 .

[63]  John R. Taylor Prepositions: patterns of polysemization and strategies of disambiguation , 1993 .

[64]  D. Gentner,et al.  Respects for similarity , 1993 .

[65]  Jean Véronis,et al.  EXTRACTING KNOWLEDGE BASES FROM MACHINE- READABLE DICTIONARIES : HAVE WE WASTED OUR TIME? , 1999 .

[66]  David R. Dowty,et al.  Word Meaning and Montague Grammar , 1979 .

[67]  J. Wiebe Constructing Bayesian Networks from WordNet for Word-SenseDisambiguation : Representational and Processing Issues , 1998 .

[68]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[69]  D. W. Dearholt,et al.  Graph theoretic foundations of pathfinder networks , 1988 .

[70]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[71]  Merriam Webster Merriam-Webster's Collegiate Dictionary , 2016 .

[72]  Yorick Wilks,et al.  Making Preferences More Active , 1978, Artif. Intell..

[73]  Nizar Habash,et al.  A thematic hierarchy for efficient generation from lexical-conceptual structure , 1998, AMTA.

[74]  Sergei Nirenburg,et al.  A lexicon for knowledge-based MT , 1995, Machine Translation.

[75]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[76]  A. Tversky Features of Similarity , 1977 .

[77]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[78]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[79]  Garrison W. Cottrell,et al.  Lexical ambiguity resolution , 1987 .

[80]  Yorick Wilks,et al.  A tractable machine dictionary as a resource for computational semantics , 1989 .

[81]  Yorick Wilks,et al.  A Preferential, Pattern-Seeking, Semantics for Natural Language Inference , 1975, Artif. Intell..

[82]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[83]  Adrian Novischi Accurate Semantic Annotations via Pattern Matching , 2002, FLAIRS Conference.

[84]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[85]  Ellen Riloff,et al.  An Empirical Approach to Conceptual Case Frame Acquisition , 1998, VLC@COLING/ACL.

[86]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[87]  Claire Cardie,et al.  University of Massachusetts: Description of the CIRCUS System as Used for MUC-4 , 1992, MUC.

[88]  Rohini K. Srihari,et al.  A Hybrid Approach for Named Entity and Sub-Type Tagging , 2000, ANLP.

[89]  Stephen D. Richardson,et al.  Determining similarity and inferring relations in a lexical knowledge base , 1997 .

[90]  Ken Litkowski,et al.  Senseval-3 task: Automatic labeling of semantic roles , 2004, SENSEVAL@ACL.

[91]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[92]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[93]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[94]  Vasile Rus,et al.  Logic Forms for WordNet Glosses , 2002 .

[95]  Lucy Vanderwende,et al.  MindNet: Acquiring and Structuring Semantic Information from Text , 1998, COLING-ACL.

[96]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[97]  Rada Mihalcea,et al.  Building a Sense Tagged Corpus with Open Mind Word Expert , 2002, SENSEVAL.

[98]  Martha W. Evens,et al.  Semantically Significant Patterns in Dictionary Definitions , 1986, ACL.

[99]  V. Raskin,et al.  Lexical Semantics of Adjectives A Microtheory of Adjectival Meaning , 1995 .

[100]  Rada Mihalcea,et al.  Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation , 2002, COLING.

[101]  James Pustejovsky,et al.  Lexical Semantics and Knowledge Representation , 1991, Lecture Notes in Computer Science.

[102]  Antal van den Bosch,et al.  Shallow Parsing on the Basis of Words Only: A Case Study , 2002, ACL.

[103]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[104]  Bertram C. Bruce Case Systems for Natural Language , 1975, Artif. Intell..

[105]  Cheng-Ming Guo Machine tractable dictionaries: design and construction , 1996 .

[106]  Stan Szpakowicz,et al.  Semiautomatic recognition of semantic relationships in english technical texts , 1998 .

[107]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[108]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[109]  Della Summers,et al.  Longman Dictionary of Contemporary English , 1995 .

[110]  Kenneth C. Litkowski Digraph Analysis of Dictionary Preposition definition , 2002, SENSEVAL.

[111]  Walter Daelemans,et al.  Memory-Based Word Sense Disambiguation , 2000, Comput. Humanit..

[112]  Glenn Shafer,et al.  Probability Judgment in Artificial Intelligence and Expert Systems , 1987 .

[113]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[114]  Judith L. Klavans Representation and acquisition of lexical knowledge : polysemy, ambiguity, and generativity : papers from the 1995 AAAI Symposium : March 27-29, Stanford, California , 1995 .

[115]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[116]  Ellen Riloff,et al.  Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons , 2002, EMNLP.

[117]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[118]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[119]  A. Tversky,et al.  Weighting common and distinctive features in perceptual and conceptual judgments , 1984, Cognitive Psychology.

[120]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[121]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[122]  Stan Matwin,et al.  Text Classification Using WordNet Hypernyms , 1998, WordNet@ACL/COLING.

[123]  Kenneth C. Litkowski Automatic Creation of Lexical Knowledge Bases: New Developments in Computational Lexicology , 1997 .

[124]  Irene Heim,et al.  Semantics in generative grammar , 1998 .

[125]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[126]  David Yarowsky,et al.  Statistical Machine Translation Final Report, Jhu Workshop 1999 , 1999 .

[127]  George A. Miller,et al.  WordNet 2 - A Morphologically and Semantically Enhanced Resource , 1999 .

[128]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[129]  Chuck Rieger,et al.  Parsing and comprehending with word experts (a theory and its realization) , 1982 .

[130]  Garrison W. Cottrell,et al.  Lexical Ambiguity Resolution: Perspectives from Psycholinguistics, Neuropsychology, and Artificial Intelligence , 1988 .

[131]  Cheng-ming Guo Constructing a MTD from LDOCE , 1996 .

[132]  Cornelia Zelinsky-Wibbelt The Semantics of prepositions : from mental processing to natural language processing , 1993 .

[133]  E. Williams,et al.  Introduction to the Theory of Grammar , 1986 .

[134]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[135]  David Lindley,et al.  The Probability Approach to the Treatment of Uncertainty in Artificial Intelligence and Expert Systems , 1987 .

[136]  Graeme Hirst,et al.  Semantic representations of near-synonyms for automatic lexical choice , 1999 .

[137]  Alon Itai,et al.  Two Languages Are More Informative Than One , 1991, ACL.

[138]  E. Rosch,et al.  Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[139]  Geoff Barnbrook,et al.  Briefly noted: defining language: A local grammar of definition sentences , 2002 .

[140]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[141]  Rada Mihalcea,et al.  A Highly Accurate Bootstrapping Algorithm for Word Sense Disambiguation , 2001, Int. J. Artif. Intell. Tools.

[142]  Collin F. Baker,et al.  Building a Large Lexical Databank Which Provides Deep Semantics , 2001, PACLIC.

[143]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[144]  M. A. K. Halliday The linguistic basis of a mechanical thesaurus , 1956, Mech. Transl. Comput. Linguistics.

[145]  Bonnie J. Dorr,et al.  Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation , 1998, Machine Translation.

[146]  Sidney I. Landau Dictionaries: The Art and Craft of Lexicography , 1985 .

[147]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[148]  Stanley Starosta,et al.  Valency and case in computational linguistics , 1990, Machine Translation.

[149]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[150]  Anthony R. Davis,et al.  Building and Maintaining a Semantically Adequate Lexicon Using Cyc , 1999 .

[151]  Claire Cardie,et al.  UMass/Hughes: Description of the CIRCUS System Used for MUC-51 , 1993, MUC.

[152]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[153]  Eugene Charniak,et al.  Assigning Function Tags to Parsed Text , 2000, ANLP.

[154]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[155]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[156]  Graeme Hirst,et al.  Resolving Lexical Ambiguity Computationally with Spreading Activation and Polaroid Words , 1988 .

[157]  Graeme Hirst,et al.  Building a lexical knowledge-base of near-synonym differences , 2004 .

[158]  Ken Litkowski,et al.  The Preposition Project , 2021, ArXiv.

[159]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[160]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[161]  Sergei Nirenburg,et al.  Lexical Acquisition with WordNet and the Mikrokosmos Ontology , 1998, WordNet@ACL/COLING.

[162]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[163]  Vasile Rus,et al.  Logic Form Transformation of WordNet and its Applicability to Question Answering , 2001, ACL.

[164]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[165]  Janyce Wiebe,et al.  Mapping Collocational Properties into Machine Learning Features , 1998, VLC@COLING/ACL.

[166]  Sergei Nirenburg,et al.  Reference and Ellipsis in Ontological Semantics , 2002 .

[167]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[168]  Sergei Nirenburg,et al.  From Submit to Submitted via Submission: On Lexical Rules in Large-Scale Lexicon Acquisition , 1996, ACL.

[169]  Karen Jensen,et al.  Disambiguating Prepositional Phrase Attachments by Using On-Line Dictionary Definitions , 1987, Comput. Linguistics.

[170]  David Yarowsky,et al.  Statistical Machine Translation: Final Report , 1999 .

[171]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[172]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[173]  B. T. S. Atkins,et al.  The dynamic database , 1996 .

[174]  Adam Kilgarriff,et al.  SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[175]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[176]  Janyce Wiebe,et al.  Class-based collocations for Word Sense Disambiguation , 2004, SENSEVAL@ACL.

[177]  Robert Alfred Amsler The Structure of the Merriam-Webster Pocket Dictionary , 1980 .

[178]  Louise Guthrie,et al.  Lexical Disambiguation using Simulated Annealing , 1992, COLING.

[179]  Sergei Nirenburg,et al.  Lexicon, Ontology, and Text Meaning , 1991, SIGLEX Workshop.

[180]  Ted Pedersen,et al.  A Statistical Decision Making Method: A Case Study on Prepositional Phrase Attachment , 1997, CoNLL.

[181]  Dirk Heylen,et al.  Lexical functions, generative lexicons and the world , 1995 .