Automatically Linking GermaNet to Wikipedia for Harvesting Corpus Examples for GermaNet Senses

The comprehension of a word sense is much easier when its usages are illustrated by example sentences in linguistic contexts. Hence, examples are crucially important to better understand the sense of a word in a dictionary. The goal of this research is the semi-automatic enrichment of senses from the German wordnet GermaNet with corpus examples from the online encyclopedia Wikipedia. The paper describes the automatic mapping of GermaNet senses to Wikipedia articles, using proven, state-ofthe-art word sense disambiguation methods, in particular different versions of word overlap algorithms and PageRank as well as classifiers that combine these methods. This mapping is optimized for precision and then used to automatically harvest corpus examples from Wikipedia for GermaNet senses. The paper presents details about the optimization of the model for the GermaNet-Wikipedia mapping and concludes with a detailed evaluation of the quantity and quality of the harvested examples. Apart from enriching the GermaNet resource, the harvested corpus examples can also be used to construct a corpus of German nouns that are annotated with GermaNet senses. This sense-annotated corpus can be used for a wide range of NLP applications.

[1]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[2]  Julio Gonzalo,et al.  Automatic Association of Web Directories with Word Senses , 2003, Computational Linguistics.

[3]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[4]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[5]  Iryna Gurevych,et al.  What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for Increased Domain Coverage , 2011, IJCNLP.

[6]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[7]  Simone Paolo Ponzetto,et al.  Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems , 2010, ACL.

[8]  E. Hinrichs,et al.  An Automatic Method for Creating a Sense-Annotated Corpus Harvested from the Web , 2013 .

[9]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[10]  Simone Paolo Ponzetto,et al.  Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia , 2009, IJCAI.

[11]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Eneko Agirre,et al.  Publicly Available Topic Signatures for all WordNet Nominal Senses , 2004, LREC.

[14]  Simone Paolo Ponzetto,et al.  Rapid Bootstrapping of Word Sense Disambiguation Resources for German , 2010, KONVENS.

[15]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[16]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[17]  Erhard W. Hinrichs,et al.  WebCAGe – A Web-Harvested Corpus Annotated with GermaNet Senses , 2012, EACL.

[18]  Erhard W. Hinrichs,et al.  GernEdiT - The GermaNet Editing Tool , 2010, LREC.

[19]  M. A. R T H A P A L,et al.  Making fine-grained and coarse-grained sense distinctions , both manually and automatically , 2005 .

[20]  Beatrice Alex,et al.  SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation Pages 333-336 Association for Computational Linguistics Stroudsburg, PA, USA ©2010 table of contents , 2010 .

[21]  Paul Buitelaar,et al.  Evaluation Corpora for Sense Disambiguation in the Medical Domain , 2002, LREC.

[22]  Adam Kilgarriff,et al.  GDEX: Automatically Finding Good Dictionary Examples in a Corpus , 2008 .

[23]  Eneko Agirre,et al.  A Study on Linking Wikipedia Categories to Wordnet Synsets using Text Similarity , 2009, RANLP.

[24]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[25]  Claudia Kunze,et al.  GermaNet - representation, visualization, application , 2002, LREC.

[26]  Mark Stevenson,et al.  Mapping WordNet synsets to Wikipedia articles , 2012, LREC.

[27]  Peng Jin,et al.  A Chinese Corpus with Word Sense Annotation , 2006, ICCPOL.

[28]  Iryna Gurevych,et al.  The People’s Web meets Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet , 2011, IWCS.

[29]  Rada Mihalcea,et al.  An Automatic Method for Generating Sense Tagged Corpora , 1999, AAAI/IAAI.

[30]  Iryna Gurevych,et al.  Aligning Sense Inventories in Wikipedia and WordNet , 2010 .

[31]  Eneko Agirre,et al.  The Basque lexical-sample task , 2004, SENSEVAL@ACL.