Two Corpus Based Experiments with the Portuguese and English Wordnets

This paper presents two experiments with real world applications of word sense disambiguation, wordnets and dependency parsing. The first is an effort towards a portuguese wordnet annotated corpus. We manually annotated 30 sentences using OpenWordNet-PT as a lexicon and then compared the results with an automatic annotation. In addition to the system’s evaluation, the results provided valuable insights about how to deal with such an ambitious task. The second experiment deals with using Princeton Wordnet as part of an NLP pipeline for information extraction from technical texts in the mining domain and the issues found while integrating word sense disambiguation with a syntactic analysis of the sentences.

[1]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[2]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[3]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Eckhard Bick,et al.  CG-3 - Beyond Classical Constraint Grammar , 2015, NODALIDA.

[6]  Carlo Strapparava,et al.  Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources , 2014, LREC.

[7]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[8]  Bernardo Magnini,et al.  Integrating Subject Field Codes into WordNet , 2000, LREC.

[9]  Francis Bond,et al.  Linking and Extending an Open Multilingual Wordnet , 2013, ACL.

[10]  Sampo Pyysalo,et al.  SETS: Scalable and Efficient Tree Search in Dependency Graphs , 2015, HLT-NAACL.

[11]  D. Flickinger Accuracy vs. Robustness in Grammar Engineering , 2010 .

[12]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[13]  Catherine Havasi,et al.  Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations , 2015, NAACL 2015.

[14]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[15]  Diana Santos,et al.  Floresta sintá(c)tica: um treebank para o português , 2001 .

[16]  Cláudia Freitas,et al.  Anotação de corpus com a OpenWordNet-PT: um exercício de desambiguação (Sense annotation with OpenWordNet-PT: an exercise of word sense disambiguation) , 2015, STIL.

[17]  Samuel R. Bowman,et al.  A Gold Standard Dependency Corpus for English , 2014, LREC.

[18]  Gerard de Melo,et al.  OpenWordNet-PT: An Open Brazilian Wordnet for Reasoning , 2012, COLING.

[19]  Carlos Ramisch,et al.  A Generic Framework for Multiword Expressions Treatment: from Acquisition to Applications , 2012, ACL 2012.

[20]  Carlo Strapparava,et al.  Using Domain Information for Word Sense Disambiguation , 2001, *SEMEVAL.

[21]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[22]  Yorick Wilks,et al.  Data Driven Ontology Evaluation , 2004, LREC.