DBpedia and the live extraction of structured data from Wikipedia

Purpose – DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia‐Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues.Design/methodology/approach – Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia‐Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia‐Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors.Findings – During the realization of DBpedia‐Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently‐upd...

[1]  Carl Lagoze,et al.  The Open Archives Initiative Protocol for Metadata Harvesting Protocol , 2002 .

[2]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[3]  David R. Karger,et al.  Exhibit: lightweight structured data publishing , 2007, WWW '07.

[4]  Wang Jun Open Archives Initiative Protocol for Metadata Harvesting , 2005 .

[5]  Sören Auer,et al.  OntoWiki: A Tool for Social, Semantic Collaboration , 2006, CKC.

[6]  Jens Lehmann,et al.  Update Strategies for DBpedia Live , 2010, SFSW.

[7]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[8]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[9]  Jens Lehmann,et al.  AutoSPARQL: Let Users Query Your Knowledge Base , 2011, ESWC.

[10]  Jens Lehmann,et al.  What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content , 2007, ESWC.

[11]  Jens Lehmann,et al.  DBpedia Live Extraction , 2009, OTM Conferences.

[12]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[13]  Michael Martin,et al.  Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis , 2010, SEMWEB.

[14]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[15]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[16]  Herbert Van de Sompel,et al.  Open Archives Initiative - Protocol for Metadata Harvesting - Specification and XML Schema for the OAI Identifier Format , 2006 .

[17]  Herbert Van de Sompel,et al.  Open Archives Initiative - Protocol for Metadata Harvesting - v.2.0 , 2002 .

[18]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[19]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[20]  Markus Krötzsch,et al.  Semantic MediaWiki , 2006, International Semantic Web Conference.

[21]  Liyang Yu,et al.  Introduction to the semantic web and semantic web services , 2007 .

[22]  Jens Lehmann,et al.  Discovering Unknown Connections - the DBpedia Relationship Finder , 2007, CSSW.

[23]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[24]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[25]  Ian Horrocks,et al.  OWL Web Ontology Language Reference-W3C Recommen-dation , 2004 .

[26]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[27]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[28]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[29]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[30]  Barry Bishop,et al.  OWLIM: A family of scalable semantic repositories , 2011, Semantic Web.