A Metadata Focused Crawler for Linked Data

The Linked Data best practices recommend publishers of triplesets to use well-known ontologies in the triplication process and to link their triplesets with other triplesets. However, despite the fact that extensive lists of open ontologies and triplesets are available, most publishers typically do not adopt those ontologies and link their triplesets only with popular ones, such as DBpedia and Geonames. This paper presents a metadata crawler for Linked Data to assist publishers in the triplification and the linkage processes. The crawler provides publishers with a list of the most suitable ontologies and vocabulary terms for triplification, as well as a list of triplesets that the new tripleset can be most likely linked with. The crawler focuses on specific metadata properties, including subclass of, and returns only metadata, hence the classification “metadata focused crawler”.

[1]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[2]  Wei-Ying Ma,et al.  Instance-based Schema Matching for Web Databases by Domain-specific Query Probing , 2004, VLDB.

[3]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[4]  Bernardo Pereira Nunes,et al.  Recommending tripleset interlinking through a social network approach , 2013 .

[5]  Z Shang,et al.  An Approach On , 2003 .

[6]  Bernardo Pereira Nunes,et al.  Identifying Candidate Datasets for Data Interlinking , 2013, ICWE.

[7]  Noureddine Mouaddib,et al.  General Purpose Database Summarization , 2005, VLDB.

[8]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[9]  Yun Peng,et al.  Finding and Ranking Knowledge on the Semantic Web , 2005, SEMWEB.

[10]  Jürgen Umbrich,et al.  LDspider: An Open-source Crawling Framework for the Web of Linked Data , 2010, SEMWEB.

[11]  Andriy Nikolov,et al.  Identifying Relevant Sources for Data Linking using a Semantic Web Index , 2011, LDOW.

[12]  Enrico Motta,et al.  What Should I Link to? Identifying Relevant Sources and Classes for Data Linking , 2011, JIST.

[13]  Cristian R. Munteanu,et al.  An Approach for the Automatic Recommendation of Ontologies Using Collaborative Knowledge , 2010, KES.

[14]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[15]  Claudio Gutiérrez,et al.  Semantic navigation on the web of data: specification of routes, web fragments and actions , 2011, WWW.