CRAWLER-LD: A Multilevel Metadata Focused Crawler Framework for Linked Data

The Linked Data best practices recommend to publish a new tripleset using well-known ontologies and to interlink the new tripleset with other triplesets. However, both are difficult tasks. This paper describes CRAWLER-LD, a metadata crawler that helps selecting ontologies and triplesets to be used, respectively, in the publication and the interlinking processes. The publisher of the new tripleset first selects a set T of terms that describe the application domain of interest. Then, he submits T to CRAWLER-LD, which searches for triplesets whose vocabularies include terms direct or transitively related to those in T. CRAWLER-LD returns a list of ontologies to be used for publishing the new tripleset, as well as a list of triplesets that the new tripleset can be interlinked with. CRAWLER-LD focuses on specific metadata properties, including subclass of, and returns only metadata, hence the classification “metadata focused crawler”.

[1]  Mirina Grosz,et al.  World Wide Web Consortium , 2010 .

[2]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[3]  Enrico Motta,et al.  What Should I Link to? Identifying Relevant Sources and Classes for Data Linking , 2011, JIST.

[4]  Yun Peng,et al.  Finding and Ranking Knowledge on the Semantic Web , 2005, SEMWEB.

[5]  Bernardo Pereira Nunes,et al.  Recommending Tripleset Interlinking through a Social Network Approach , 2013, WISE.

[6]  Cristian R. Munteanu,et al.  An Approach for the Automatic Recommendation of Ontologies Using Collaborative Knowledge , 2010, KES.

[7]  Bernardo Pereira Nunes,et al.  Identifying Candidate Datasets for Data Interlinking , 2013, ICWE.

[8]  Claudio Gutiérrez,et al.  Semantic navigation on the web of data: specification of routes, web fragments and actions , 2011, WWW.

[9]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[10]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[11]  Jürgen Umbrich,et al.  LDspider: An Open-source Crawling Framework for the Web of Linked Data , 2010, SEMWEB.

[12]  Andriy Nikolov,et al.  Identifying Relevant Sources for Data Linking using a Semantic Web Index , 2011, LDOW.

[13]  Noureddine Mouaddib,et al.  General Purpose Database Summarization , 2005, VLDB.

[14]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .