DBpedia: A Multilingual Cross-domain Knowledge Base

The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge. The knowledge base contains textual descriptions (titles and abstracts) of concepts in up to 97 languages. It also contains structured knowledge that has been extracted from the infobox systems of Wikipedias in 15 different languages and is mapped onto a single consistent ontology by a community effort. The knowledge base can be queried using the SPARQL query language and all its data sets are freely available for download. In this paper, we describe the general DBpedia knowledge base and as well as the DBpedia data sets that specifically aim at supporting computational linguistics tasks. These task include Entity Linking, Word Sense Disambiguation, Question Answering, Slot Filling and Relationship Extraction. These use cases are outlined, pointing at added value that the structured data of DBpedia provides.

[1]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[2]  Carlo Strapparava,et al.  Kernel Methods for Minimally Supervised WSD , 2009, CL.

[3]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[4]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[5]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[8]  Paul McNamee,et al.  An Evaluation of Technologies for Knowledge Base Population , 2010, LREC.

[9]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[10]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[11]  Christian Bizer,et al.  Multipedia: enriching DBpedia with multimedia information , 2011, K-CAP '11.

[12]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[13]  Vivi Nastase,et al.  Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation , 2008, EMNLP.

[14]  Eugenio Tacchini,et al.  Experiments with Wikipedia Cross-Language Data Fusion , 2009, SFSW@ESWC.

[15]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[16]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[17]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.