Two Approaches to the Dataset Interlinking Recommendation Problem

Whenever a dataset t is published on the Web of Data, an exploratory search over existing datasets must be performed to identify those datasets that are potential candidates to be interlinked with t. This paper introduces and compares two approaches to address the dataset interlinking recommendation problem, respectively based on Bayesian classifiers and on Social Network Analysis techniques. Both approaches define rank score functions that explore the vocabularies, classes and properties that the datasets use, in addition to the known dataset links. After extensive experiments using real-world datasets, the results show that the rank score functions achieve a mean average precision of around 60%. Intuitively, this means that the exploratory search for datasets to be interlinked with t might be limited to just the top-ranked datasets, reducing the cost of the dataset interlinking process.

[1]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[2]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[3]  Andriy Nikolov,et al.  Identifying Relevant Sources for Data Linking using a Semantic Web Index , 2011, LDOW.

[4]  Proceedings of the LAK Data Challenge, Leuven, Belgium, April 9, 2013 , 2013, LAK.

[5]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[6]  Achim Rettinger,et al.  Discovering Related Data Sources in Data-Portals , 2013, SemStats@ISWC.

[7]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[8]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  K. A. Kuznetsov Scientific data integration system in the linked open data space , 2013, Programming and Computer Software.

[10]  Marco A. Casanova,et al.  A Metadata Focused Crawler for Linked Data , 2014, ICEIS.

[11]  Bernardo Pereira Nunes,et al.  Recommending tripleset interlinking through a social network approach , 2013 .

[12]  Bernardo Pereira Nunes,et al.  Identifying Candidate Datasets for Data Interlinking , 2013, ICWE.

[13]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[14]  Bernadette Farias Lóscio,et al.  Feedback-based data set recommendation for building linked data applications , 2012, I-SEMANTICS '12.

[15]  Jürgen Umbrich,et al.  DING! Dataset Ranking using Formal Descriptions , 2009, LDOW.

[16]  Enrico Motta,et al.  What Should I Link to? Identifying Relevant Sources and Classes for Data Linking , 2011, JIST.

[17]  Ana Carolina Salgado,et al.  Using information quality for the identification of relevant web data sources: a proposal , 2012, IIWAS '12.

[18]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[19]  Bernardo Pereira Nunes,et al.  TRT - A Tripleset Recommendation Tool , 2013, International Semantic Web Conference.

[20]  Michael Hausenblas,et al.  Describing linked datasets with the VoID vocabulary , 2011 .

[21]  Anja Jentzsch,et al.  Augmenting the Web of Data using Referers , 2011, LDOW.