DSCrank: A Method for Selection and Ranking of Datasets

Considerable efforts have been made to build the Web of Data. One of the main challenges has to do with how to identify the most related datasets to connect to. Another challenge is to publish a local dataset into the Web of Data, following the Linked Data principles. The present work is based on the idea that a set of activities should guide the user on the publication of a new dataset into the Web of Data. It presents the specification and implementation of two initial activities, which correspond to the crawling and ranking of a selected set of existing published datasets. The proposed implementation is based on the focused crawling approach, adapting it to address the Linked Data principles. Moreover, the dataset ranking is based on a quick glimpse into the content of the selected datasets. Additionally, the paper presents a case study in the Biomedical area to validate the implemented approach, and it shows promising results with respect to scalability and performance.

[1]  Michael Hausenblas,et al.  Exploiting Linked Data to Build Web Applications , 2009, IEEE Internet Computing.

[2]  Enrico Motta,et al.  What Should I Link to? Identifying Relevant Sources and Classes for Data Linking , 2011, JIST.

[3]  Bernadette Farias Lóscio,et al.  Feedback-based data set recommendation for building linked data applications , 2012, I-SEMANTICS '12.

[4]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[5]  S. Venkatesan,et al.  Performance comparison of various information retrieval models used in search engines , 2012, 2012 International Conference on Communication, Information & Computing Technology (ICCICT).

[6]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[7]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[8]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[9]  Bernardo Pereira Nunes,et al.  Identifying Candidate Datasets for Data Interlinking , 2013, ICWE.

[10]  Mark A. Musen,et al.  BioPortal as a dataset of linked biomedical ontologies and terminologies in RDF , 2013, Semantic Web.

[11]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[12]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[13]  Rifat Ozcan,et al.  Comparing classification methods for link context based focused crawlers , 2013, 2013 International Conference on Electronics, Computer and Computation (ICECCO).