Word Sense Discovery and Disambiguation

The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we find translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classifiers and discovered word sense classifications, and finally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris’ hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes.

[1]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[2]  Betty Kirkpatrick,et al.  Roget's Thesaurus , 1852 .

[3]  Zhang Min,et al.  Direct orthographical mapping for machine transliteration , 2004, COLING 2004.

[4]  Atro Voutilainen Three studies of grammar-based surface parsing of unrestricted English text , 1994 .

[5]  Lauri Carlson,et al.  Unification as a Grammatical Tool , 1987, Nordic Journal of Linguistics.

[6]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[7]  Colin Yallop,et al.  The Macquarie Dictionary , 1996, English Today.

[8]  Mark Fischetti,et al.  Weaving the web - the original design and ultimate destiny of the World Wide Web by its inventor , 1999 .

[9]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[10]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[11]  Chris Mellish,et al.  Natural Language Processing in Pop-11: An Introduction to Computational Linguistics , 1989 .

[12]  Krister Lindén Finding Cross-Lingual Spelling Variants , 2004, SPIRE.

[13]  Adam Kilgarriff,et al.  What’s in a Thesaurus? , 2000, LREC.

[14]  Adam Kilgarriff,et al.  Gold standard datasets for evaluating word sense disambiguation programs , 1998, Comput. Speech Lang..

[15]  Graeme Hirst,et al.  Building and Using a Lexical Knowledge Base of Near-Synonym Differences , 2006, Computational Linguistics.

[16]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[17]  Donna K. Harman,et al.  The Text REtrieval Conference (TREC) , 1999, NTCIR.

[18]  Carlo Strapparava,et al.  The role of domain information in Word Sense Disambiguation , 2002, Natural Language Engineering.

[19]  Ellen M. Voorhees,et al.  Disambiguating Highly Ambiguous Words , 1998, CL.

[20]  Gregory Grefenstette,et al.  Estimation of English and non-English Language Use on the WWW , 2000, RIAO.

[21]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[22]  Will Lowe Semantic Representation and Priming in a Self-organizing Lexicon , 1997, NCPW.

[23]  Abraham Kaplan,et al.  An experimental study of ambiguity and context , 1955, Mech. Transl. Comput. Linguistics.

[24]  Michael Kearns,et al.  A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split , 1995, Neural Computation.

[25]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[26]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[27]  Hozumi Tanaka,et al.  A hybrid back-transliteration system for Japanese , 2004, COLING.

[28]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[29]  Atro Voutilainen,et al.  Engcg Tagger, Version 2 , .

[30]  Adam Kilgarriff,et al.  Introduction to the special issue on evaluating word sense disambiguation systems , 2002, Natural Language Engineering.

[31]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[32]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[33]  W. N. Locke,et al.  Machine Translation of Languages , 1956 .

[34]  Patrick Pantel,et al.  Clustering by committee , 2003 .

[35]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[36]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[37]  Cyril Goutte,et al.  Note on Free Lunches and Cross-Validation , 1997, Neural Computation.

[38]  Margaret Masterman,et al.  Language, Cohesion and Form: The potentialities of a mechanical thesaurus , 2005 .

[39]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[40]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[41]  Ganesh Ramakrishnan,et al.  A gloss-centered algorithm for disambiguation , 2004, SENSEVAL@ACL.

[42]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[43]  Turid Hedlund,et al.  Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings , 2001, Information Retrieval.

[44]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[45]  Dominic Widdows,et al.  Visualisation Techniques for Analysing Meaning , 2002, TSD.

[46]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[47]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[48]  G. W. Snedecor Statistical Methods , 1964 .

[49]  Dominic Widdows,et al.  Discovering Corpus-Specific Word Senses , 2003, EACL.

[50]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[51]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[52]  Kenneth Ward Church,et al.  Work on Statistical Methods for Word Sense Disambiguation , 1992 .

[53]  R. Darnell Translation , 1873, The Indian medical gazette.

[54]  Jian Su,et al.  Direct Orthographical Mapping for Machine Transliteration , 2004, COLING.

[55]  Timo Lahtinen,et al.  Automatic indexing: an approach using an index term corpus and combining linguistic and statistical methods , 2000 .

[56]  Ying Zhang,et al.  Using the web for automated translation extraction in cross-language information retrieval , 2004, SIGIR '04.

[57]  Hinrich Schfitze Context Space , 2001 .

[58]  L. Gleitman The Structural Sources of Verb Meanings , 2020, Sentence First, Arguments Afterward.

[59]  ResnikPhilip,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999 .

[60]  David Yarowsky,et al.  Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.

[61]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[62]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[63]  Tom Brøndsted,et al.  Sprog og multimedier , 1997 .

[64]  Paul Pimsleur Semantic frequency counts , 1957, Mech. Transl. Comput. Linguistics.

[65]  Yorick Wilks,et al.  Can We Make Information Extraction More Adaptive , 1999 .

[66]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[67]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[68]  Eneko Agirre,et al.  Knowledge Sources for Word Sense Disambiguation , 2001, TSD.

[69]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[70]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[71]  Ellen M. Voorhees,et al.  Learning context to disambiguate word senses , 1992 .

[72]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[73]  Gerald Gazdar,et al.  Natural Language Processing in PROLOG: An Introduction to Computational Linguistics , 1989 .

[74]  Julie Elizabeth Weeds,et al.  Measures and applications of lexical distributional similarity , 2003 .

[75]  Samuel Kaski,et al.  Associative clustering for exploring dependencies between functional genomics data sets , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[76]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[77]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[78]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[79]  Eneko Agirre,et al.  One Sense per Collocation and Genre/Topic Variations , 2000, EMNLP.

[80]  Adam Kilgarriff The Language of Word Meaning: Generative Lexicon Meets Corpus Data: The Case of Nonstandard Word Uses , 2001 .

[81]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[82]  Ari Pirkola,et al.  Studies on Linguistic Problems and Methods in Text Retrieval: The Effects of Anaphor and Ellipsis Resolution in Proximity Searching, and Translation and Query Structuring Methods in Cross-Language Retrieval , 1999 .

[83]  Kazuhide Yamamoto,et al.  Detecting Transliterated Orthographic Variants via Two Similarity Metrics , 2004, COLING.

[84]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[85]  Julie Weeds,et al.  Automatic Identification of Infrequent Word Senses , 2004, COLING.

[86]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[87]  Sergei Nirenburg,et al.  The Mechanical Determination of Meaning , 2003 .

[88]  Mehryar Mohri,et al.  Learning from Uncertain Data , 2003, COLT.

[89]  Eneko Agirre,et al.  Combining Supervised and Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation , 2000, Comput. Humanit..

[90]  Gregory Grefenstette,et al.  Automatic transliteration for Japanese-to-English text retrieval , 2003, SIGIR.

[91]  Kari K. Pitkänen,et al.  The Spatio-Temporal Setting in Written Narrative Fiction : A Study of Interaction between Words, Text and Encyclopedic Knowledge in the Creation of Textual Meaning , 2003 .

[92]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[93]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[94]  A. Kilgarriff,et al.  Thesauruses for natural language processing , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[95]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[96]  Timo Honkela,et al.  Newsgroup Exploration with WEBSOM Method and Browsing Interface , 1996 .

[97]  Mirella Lapata,et al.  Verb Class Disambiguation Using Informative Priors , 2004, CL.

[98]  Krister Lindén Language Applications with Finite-state Technology: Presentation , 1997 .

[99]  Dutch ROGET'S THESAURUS , 1979 .

[100]  Yorick Wilks,et al.  Language processing and the thesaurus , 1998 .

[101]  Mikko Kurimo,et al.  An Efficiently Focusing Large Vocabulary Language Model , 2002, ICANN.

[102]  Gina-Anne Levow Issues in pre- and post-translation document expansion: untranslatable cognates and missegmented words , 2003, IRAL.

[103]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[104]  Christer Samuelsson,et al.  A Statistical Theory of Dependency Syntax , 2000, COLING.

[105]  Adam Kilgarriff,et al.  How Dominant Is the Commonest Sense of a Word? , 2004, TSD.

[106]  R. H. Richens Interlingual Machine Translation , 1958, Comput. J..

[107]  L. Gleitman,et al.  Hard Words , 2005, Sentence First, Arguments Afterward.

[108]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[109]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[110]  Teuvo Kohonen,et al.  Self-Organizing Maps, Second Edition , 1997, Springer Series in Information Sciences.

[111]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[112]  Kalervo Järvelin,et al.  Employing the resolution power of search keys , 2001 .

[113]  D. Id,et al.  Evaluating sense disambiguation across diverse parameter spaces , 2002 .

[114]  Patricia S. O Sullivan,et al.  100 Statistical Tests , 1995 .

[115]  Mark Sanderson,et al.  Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ , 2022 .

[116]  Kalervo Järvelin,et al.  Non-adjacent Digrams Improve Matching of Cross-Lingual Spelling Variants , 2003, SPIRE.

[117]  Claire Cardie,et al.  Using clustering and SuperConcepts within SMART: TREC 6 , 1997, Inf. Process. Manag..

[118]  Atro Voutilainen A syntax-based part-of-speech analyser , 1995, EACL.

[119]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[120]  Rada Mihalcea,et al.  Building a Sense Tagged Corpus with Open Mind Word Expert , 2002, SENSEVAL.

[121]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.

[122]  Mark W. Davis,et al.  On The Effective Use of Large Parallel Corpora in Cross-Language Text Retrieval , 1998 .

[123]  B. Boguraev Book Reviews: Looking Up: An Account of the COBUILD PROJECT IN LEXICAL COMPUTING , 1990, CL.

[124]  Lars Ahrenberg,et al.  PAPERS FROM THE FIFTH SCANDINAVIAN CONFERENCE OF COMPUTATIONAL LINGUISTICS , 2002 .

[125]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[126]  Janyce Wiebe,et al.  Exploring attitude and affect in text : theories and applications : papers from the 2004 AAAI Symposium, March 22-24, Stanford, California , 2004 .

[127]  David Sheskin The McNemar Test , 2003 .

[128]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[129]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[130]  W. J. Langford Statistical Methods , 1959, Nature.

[131]  Walter Daelemans,et al.  Evaluation of Machine Learning Methods for Natural Language Processing Tasks , 2002, LREC.

[132]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[133]  Adam Kilgarriff,et al.  WORD SKETCH: Extraction and Display of Signicant Collocations for Lexicography , 2000 .

[134]  David Yarowsky,et al.  Minimally Supervised Morphological Analysis by Multimodal Alignment , 2000, ACL.

[135]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[136]  Tanja Gaustad,et al.  Statistical Corpus-Based Word Sense Disambiguation: Pseudowords vs. Real Ambiguous Words , 2001, ACL.

[137]  Mathias Creutz,et al.  Induction of a Simple Morphology for Highly-Inflecting Languages , 2004, SIGMORPHON@ACL.

[138]  Lou Burnard,et al.  Where did we Go Wrong? A Retrospective Look at the British National Corpus , 2002 .

[139]  David Yarowsky,et al.  Combining Classifiers for word sense disambiguation , 2002, Nat. Lang. Eng..

[140]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[141]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[142]  Samuel Kaski,et al.  Discriminative clustering , 2005, Neurocomputing.

[143]  Ted Pedersen,et al.  A Decision Tree of Bigrams is an Accurate Predictor of Word Sense , 2001, NAACL.

[144]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[145]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[146]  Mehryar Mohri,et al.  Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..

[147]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[148]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[149]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[150]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[151]  W. Lowe,et al.  Towards a Theory of Semantic Space , 2001 .

[152]  Mikko Kurimo,et al.  Language model adaptation in speech recognition using document maps , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[153]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.