Automatic Creation and Analysis of a Linked Data Cloud Diagram

Datasets published on the Web and following the Linked Open Data (LOD) practices have the potential to enrich other LOD datasets in multiple domains. However, the lack of descriptive information, combined with the large number of available LOD datasets, inhibits their interlinking and consumption. Aiming at facilitating such tasks, this paper proposes an automated clustering process for the LOD datasets that, thereby, provide an up-to-date description of the LOD cloud. The process combines metadata inspection and extraction strategies, community detection methods and dataset profiling techniques. The clustering process is evaluated using the LOD diagram as ground truth. The results show the ability of the proposed process to replicate the LOD diagram and to identify new LOD dataset clusters. Finally, experiments conducted by LOD experts indicate that the clustering process generates dataset clusters that tend to be more descriptive than those manually defined in the LOD diagram.

[1]  Amit P. Sheth,et al.  Automatic Domain Identification for Linked Open Data , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[2]  Fergal Reid,et al.  Title Detecting Highly Overlapping Community Structure by Greedy Clique Expansion Detecting Highly Overlapping Community Structure by Greedy Clique Expansion , 2022 .

[3]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[4]  Heiko Paulheim,et al.  Adoption of the Linked Data Best Practices in Different Topical Domains , 2014, SEMWEB.

[5]  Bernardo Pereira Nunes,et al.  Complex matching of RDF datatype properties , 2013 .

[6]  Bernardo Pereira Nunes,et al.  Two Approaches to the Dataset Interlinking Recommendation Problem , 2014, WISE.

[7]  Bernardo Pereira Nunes,et al.  Identifying Candidate Datasets for Data Interlinking , 2013, ICWE.

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[10]  Bernardo Pereira Nunes,et al.  Recommending tripleset interlinking through a social network approach , 2013 .

[11]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[13]  Wolfgang Nejdl,et al.  Exploiting the wisdom of the crowds for characterizing and connecting heterogeneous resources , 2014, HT.

[14]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[15]  Diego López-de-Ipiña,et al.  Detection of Related Semantic Datasets Based on Frequent Subgraph Mining , 2015, IESD@ISWC.

[16]  Wolfgang Nejdl,et al.  A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles , 2014, ESWC.

[17]  Marko A. Rodriguez,et al.  A Graph Analysis of the Linked Data Cloud , 2009, ArXiv.

[18]  Bernardo Pereira Nunes,et al.  TRT - A Tripleset Recommendation Tool , 2013, International Semantic Web Conference.

[19]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[20]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.