Wikipedia-based WSD for multilingual frame annotation

Many applications in the context of natural language processing have been proven to achieve a significant performance when exploiting semantic information extracted from high-quality annotated resources. However, the practical use of such resources is often biased by their limited coverage. Furthermore, they are generally available only for English and few other languages. We propose a novel methodology that, starting from the mapping between FrameNet lexical units and Wikipedia pages, automatically leverages from Wikipedia new lexical units and example sentences. The goal is to build a reference data set for the semi-automatic development of new FrameNets. In addition, this methodology can be adapted to perform frame identification in any language available in Wikipedia. Our approach relies on a state-of-the-art word sense disambiguation system that is first trained on English Wikipedia to assign a page to the lexical units in a frame. Then, this mapping is further exploited to perform frame identification in English or in any other language available in Wikipedia. Our approach shows a high potential in multilingual settings, because it can be applied to languages for which other lexical resources such as WordNet or thesauri are not available.

[1]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[2]  Sara Tonelli,et al.  New Features for FrameNet - WordNet Mapping , 2009, CoNLL.

[3]  Hans C. Boas,et al.  6. Frame-based contrastive lexical semantics in Japanese FrameNet: The case of risk and kakeru , 2009 .

[4]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[5]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[6]  Roser Morante,et al.  SemEval-2010 Task 10: Linking Events and Their Participants in Discourse , 2009, SemEval@ACL.

[7]  Guillaume Pitel,et al.  Annotation précise du français en sémantique de rôles par projection cross-linguistique , 2007 .

[8]  Lei Shi,et al.  Putting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing , 2005, CICLing.

[9]  Emanuele Pianta,et al.  Frame Information Transfer from English to Italian , 2008, LREC.

[10]  Carlo Strapparava,et al.  Kernel Methods for Minimally Supervised WSD , 2009, CL.

[11]  Shun Ishizaki,et al.  THE JAPANESE FRAMENET PROJECT: A Preliminary Report , 2003 .

[12]  Roberto Basili,et al.  Towards a Vector Space Model for FrameNet-like Resources , 2008, LREC.

[13]  Claudio Giuliano,et al.  Wikipedia as Frame Information Repository , 2009, EMNLP.

[14]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[15]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[16]  Kyoko Ohara Lexicon, Grammar, and Multilinguality in the Japanese FrameNet , 2008, LREC.

[17]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[18]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[19]  Piek T. J. M. Vossen,et al.  SemEval-2010 Task 17: All-Words Word Sense Disambiguation on a Specific Domain , 2009, *SEMEVAL.

[20]  Emanuele Pianta,et al.  Extending English ACE 2005 Corpus Annotation with Ground-truth Links to Wikipedia , 2010, PWNLP@COLING.

[21]  Katrin Erk,et al.  A WordNet Detour to FrameNet , 2005 .

[22]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[23]  Emanuele Pianta,et al.  Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus , 2005, Natural Language Engineering.

[24]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[25]  Katrin Erk,et al.  The SALSA Corpus: a German Corpus Resource for Lexical Semantics , 2006, LREC.

[26]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[27]  Hans C. Boas,et al.  Semantic Frames as Interlingual Representations for Multilingual Lexical Databases , 2005 .

[28]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.

[29]  Hans C. Boas 5. Spanish FrameNet: A frame-semantic analysis of the Spanish lexicon , 2009 .

[30]  Mirella Lapata,et al.  Cross-Lingual Bootstrapping of Semantic Lexicons: The Case of FrameNet , 2005, AAAI.

[31]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[32]  Roberto Basili,et al.  Automatic induction of FrameNet lexical units , 2008, EMNLP.

[33]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[34]  Katrin Erk,et al.  HALMANESER – A Toolchain For Shallow Semantic Parsing , 2006 .

[35]  Richard Johansson,et al.  Using WordNet to Extend FrameNet Coverage , 2007 .

[36]  Ido Dagan,et al.  Generating Entailment Rules from FrameNet , 2010, ACL.

[37]  Gaël de Chalendar,et al.  FrameNet Translation Using Bilingual Dictionaries with Evaluation on the English-French Pair , 2010, LREC.

[38]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[39]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[40]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[41]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[42]  Philip Resnik,et al.  Inducing Frame Semantic Verb Classes from WordNet and LDOCE , 2004, ACL.

[43]  Rada Mihalcea,et al.  Linking Documents to Encyclopedic Knowledge , 2008, IEEE Intelligent Systems.

[44]  Pascale Fung,et al.  Automatic Construction of an English-Chinese Bilingual FrameNet , 2004, NAACL.

[45]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[46]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[47]  C. Fellbaum An Electronic Lexical Database , 1998 .

[48]  Linguaggio Semi-automatic techniques for extending the FrameNet lexical database to new languages , 2009 .

[49]  Noah A. Smith,et al.  Probabilistic Frame-Semantic Parsing , 2010, NAACL.

[50]  Mirella Lapata,et al.  Cross-lingual Annotation Projection for Semantic Roles , 2009, J. Artif. Intell. Res..

[51]  Hans C. Boas 7. Typological considerations in constructing a Hebrew FrameNet , 2009 .

[52]  Luciano Serafini,et al.  A novel Framenet-based resource for the semantic web , 2012, SAC '12.

[53]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[54]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[55]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[56]  Stefan Thater,et al.  Assessing the impact of frame semantics on textual entailment , 2009, Natural Language Engineering.

[57]  Philipp Cimiano,et al.  Enriching the crosslingual link structure of Wikipedia - A classification-based approach , 2008, AAAI 2008.

[58]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[59]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[60]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[61]  Charles J. Fillmore,et al.  Frames and the semantics of understanding , 1985 .

[62]  Paul Buitelaar,et al.  Domain-Specific English-To-Spanish Translation of FrameNet , 2008, LREC.

[63]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[64]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[65]  Sanda M. Harabagiu,et al.  UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources , 2010, *SEMEVAL.

[66]  Sara Tonelli,et al.  Guidelines for annotating the LUNA corpus with frame information , 2010 .

[67]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[68]  Alessandro Moschitti,et al.  A General Purpose FrameNet-based Shallow Semantic Parser , 2010, LREC.

[69]  Olena Medelyan,et al.  Integrating Cyc and Wikipedia: Folksonomy meets rigorously defined common-sense , 2008, AAAI 2008.

[70]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[71]  Roberto Basili,et al.  Combining Word Sense and Usage for Modeling Frame Semantics , 2008, STEP.

[72]  Roberto Basili,et al.  Cross-Lingual Alignment of FrameNet Annotations through Hidden Markov Models , 2010, CICLing.

[73]  John Shawe-Taylor,et al.  Syllables and other String Kernel Extensions , 2002, ICML.

[74]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[75]  Olena Medelyan,et al.  "All You Can Eat" Ontology-Building: Feeding Wikipedia to Cyc , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.