DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia

The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a world-wide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications.

[1]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[2]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[3]  Carl Lagoze,et al.  The Open Archives Initiative Protocol for Metadata Harvesting Protocol , 2002 .

[4]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[5]  Wang Jun Open Archives Initiative Protocol for Metadata Harvesting , 2005 .

[6]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[7]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[8]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[9]  Jens Lehmann,et al.  What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content , 2007, ESWC.

[10]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[11]  K. Nakayama,et al.  Wikipedia Mining Wikipedia as a Corpus for Knowledge Extraction , 2008 .

[12]  Daniel S. Weld,et al.  Automatically refining the wikipedia infobox ontology , 2008, WWW.

[13]  Simone Paolo Ponzetto,et al.  WikiTaxonomy: A Large Scale Knowledge Resource , 2008, ECAI.

[14]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[15]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[16]  Christian Becker,et al.  Exploring the Geospatial Semantic Web with DBpedia Mobile , 2009, J. Web Semant..

[17]  Jens Lehmann,et al.  DBpedia Live Extraction , 2009, OTM Conferences.

[18]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[19]  Jens Lehmann,et al.  RelFinder: Revealing Relationships in RDF Knowledge Bases , 2009, SAMT.

[20]  Eugenio Tacchini,et al.  Experiments with Wikipedia Cross-Language Data Fusion , 2009, SFSW@ESWC.

[21]  Christian Bizer,et al.  Media Meets Semantic Web - How the BBC Uses DBpedia and Linked Data to Make Connections , 2009, ESWC.

[22]  Oscar Corcho,et al.  Preliminary Results in Tag Disambiguation using DBpedia , 2009 .

[23]  Vittorio Castelli,et al.  Slot Filling through Statistical Processing and Inference Rules , 2009, TAC.

[24]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[25]  Ying Shi,et al.  LCC Approaches to Knowledge Base Population at TAC 2010 , 2010, TAC.

[26]  Thomas Ertl,et al.  Facet Graphs: Complex Semantic Querying Made Easy , 2010, ESWC.

[27]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[28]  Jutta Degener,et al.  Optimizing schema-last tuple-store queries in graphd , 2010, SIGMOD Conference.

[29]  Christian Bizer,et al.  Faceted Wikipedia Search , 2010, BIS.

[30]  Christian Bizer,et al.  Multipedia: enriching DBpedia with multimedia information , 2011, K-CAP '11.

[31]  Thomas Ertl,et al.  SemLens: visual analysis of semantic data with scatter plots and semantic lenses , 2011, I-Semantics '11.

[32]  Hamish Cunningham,et al.  FREyA: An Interactive Way of Querying Linked Data Using Natural Language , 2011, ESWC Workshops.

[33]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[34]  Guilin Qi,et al.  Zhishi.me - Weaving Chinese Linking Open Data , 2011, SEMWEB.

[35]  Juan-Zi Li,et al.  Cross-lingual knowledge linking across wiki knowledge bases , 2012, WWW.

[36]  Enrico Motta,et al.  Integration of micro-gravity and geodetic data to constrain shallow system mass changes at Krafla Volcano, N Iceland , 2006 .

[37]  Dimitris Kontokostas,et al.  Internationalization of Linked Data: The case of the Greek DBpedia edition , 2012, J. Web Semant..

[38]  Giovanni Tummarello,et al.  Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[39]  Jens Lehmann,et al.  DBpedia and the live extraction of structured data from Wikipedia , 2012, Program.

[40]  Fabien L. Gandon,et al.  QAKiS: an Open Domain QA System based on Relational Patterns , 2012, SEMWEB.

[41]  Christian Bizer,et al.  DBpedia: A Multilingual Cross-domain Knowledge Base , 2012, LREC.

[42]  Sebastian Hellmann,et al.  Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Data Cloud , 2012, JIST.

[43]  Silvia Mazzini,et al.  LodLive, exploring the web of data , 2012, I-SEMANTICS '12.

[44]  Jens Lehmann,et al.  LinkedGeoData: A core for a web of spatial open data , 2012, Semantic Web.

[45]  Maria Teresa Pazienza,et al.  Semantic turkey: a browser-integrated environment for knowledge acquisition and management , 2012 .

[46]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[47]  Jens Lehmann,et al.  TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data , 2013, KESW.

[48]  Jens Lehmann,et al.  User-driven quality evaluation of DBpedia , 2013, I-SEMANTICS '13.

[49]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[50]  Jens Lehmann,et al.  Increasing the financial transparency of European Commission project funding , 2014, Semantic Web.