(Digital) Goodies from the ERC Wishing Well: BabelNet, Babelfy, Video Games with a Purpose and the Wikipedia Bitaxonomy

Multilinguality is a key feature of today’s Web, and it is this feature that we leverage and exploit in our research work at the Sapienza University of Rome’s Linguistic Computing Laboratory, which I am going to overview and showcase in this talk. I will start by presenting BabelNet 2.5 (Navigli and Ponzetto, 2012), available at http://babelnet.org, a very large multilingual encyclopedic dictionary and semantic network, which covers 50 languages and provides both lexicographic and encyclopedic knowledge for all the open-class parts of speech, thanks to the seamless integration of WordNet, Wikipedia, Wiktionary, OmegaWiki, Wikidata and the Open Multilingual WordNet. In order to construct the BabelNet network, we extract at different stages: from WordNet, all available word senses (as concepts) and all the lexical and semantic pointers between synsets (as relations); from Wikipedia, all the Wikipages (i.e., Wikipages, as concepts) and semantically unspecified relations from their hyperlinks. WordNet and Wikipedia overlap both in terms of concepts and relations: this overlap makes the merging between the two resources possible, enabling the creation of a unified knowledge resource. In order to enable multilinguality, we collect the lexical realizations of the available concepts in different languages. Finally, we connect the multilingual Babel synsets by establishing semantic relations between them. Next, I will presentBabelfy (Moro et al., 2014), available athttp://babelfy.o rg, a unified approach that leverages BabelNet to perform Word Sense Disambiguation (WSD) and Entity Linking in arbitrary languages, with performance on both tasks on a par with, or surpassing, those of task-specific state-of-the-art supervised systems. Babelfy works in three steps: first, given a lexicalized semantic network, we associate with each vertex, i.e., either concept or named entity, a semantic signature, that is, a set of related vertices. This is a preliminary step which needs to be performed only once, independently of the input text. Second, given a text, we extract all the linkable fragments from this text and, for each of them, list the possible meanings according to the semantic network. Third, we create a graph-based semantic interpretation of the whole text by linking the candidate meanings of the extracted fragments using the previously-computed semantic signatures. We then extract a dense subgraph of this representation and select the best candidate meaning for each fragment. Our experiments show state-of-the-art performances on both WSD and EL on 6 different datasets, including a multilingual setting. In the third part of the talk I will present two novel approaches to large-scale knowledge acquisition and validation developed in my lab. I will first introduce video games with a purpose (Vannella et al., 2014), a novel, powerful paradigm for the large scale acquisition and validation of knowledge and data (http://knowledgeforge.org). We demonstrate that converting games with a purpose into more traditional video games provides a fun component that motivates players to annotate for free, thereby significantly lowering annotation costs below that of crowdsourcing. Moreover, we show that video games with a purpose produce higher-quality annotations than crowdsourcing.

[1]  Simone Paolo Ponzetto,et al.  Collaboratively built semi-structured content and Artificial Intelligence: The story so far , 2013, Artif. Intell..

[2]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[3]  Chei Sian Lee,et al.  A Survey and Typology of Human Computation Games , 2012, 2012 Ninth International Conference on Information Technology - New Generations.

[4]  Michael Strube,et al.  Transforming Wikipedia into a large scale multilingual concept network , 2013, Artif. Intell..

[5]  Philipp Cimiano,et al.  Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0 , 2014, LREC.

[6]  Roberto Navigli,et al.  It’s All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation , 2014, TACL.

[7]  Gerhard Weikum,et al.  MENTA: inducing multilingual taxonomies from wikipedia , 2010, CIKM '10.

[8]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[9]  Asunción Gómez-Pérez,et al.  Interchanging lexical resources on the Semantic Web , 2012, Language Resources and Evaluation.

[10]  Roberto Navigli,et al.  Multilingual Word Sense Disambiguation and Entity Linking for Everybody , 2014, International Semantic Web Conference.

[11]  Roberto Navigli,et al.  A Robust Approach to Aligning Heterogeneous Lexical Resources , 2014, ACL.

[12]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[13]  Udo Kruschwitz,et al.  Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation , 2013, TIIS.

[14]  Roberto Navigli,et al.  Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose , 2014, ACL.

[15]  Luis von Ahn Games with a Purpose , 2006, Computer.

[16]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[17]  Roberto Navigli A Quick Tour of Word Sense Disambiguation, Induction and Related Approaches , 2012, SOFSEM.

[18]  Roberto Navigli,et al.  A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation , 2014, CL.

[19]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[20]  Francis Bond,et al.  Linking and Extending an Open Multilingual Wordnet , 2013, ACL.

[21]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[22]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[23]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[24]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[25]  Marco Baroni,et al.  Bootstrapping a Game with a Purpose for Commonsense Collection , 2012, TIST.

[26]  N. Sadat Shami,et al.  Experiments on Motivational Feedback for Crowdsourced Workers , 2013, ICWSM.

[27]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[28]  Martin Hepp,et al.  OntoGame: Weaving the Semantic Web by Online Games , 2008, ESWC.

[29]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[30]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[31]  Paola Velardi,et al.  From Glossaries to Ontologies: Extracting Semantic Structure from Textual Definitions , 2008, Ontology Learning and Population.

[32]  Tiziano Flati,et al.  Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project , 2014, ACL.

[33]  Udo Kruschwitz,et al.  Using Games to Create Language Resources: Successes and Limitations of the Approach , 2013, The People's Web Meets NLP.

[34]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.