Semantic Graph Analysis for Federated LOD Surfing in Life Sciences

Currently, Linked Open Data (LOD) is increasingly used when publishing life science databases. To facilitate flexible use of such databases, we employ a method that uses federated query search along a path of class–class relationships. However, an effective method for federated query search requires analysis of the structure the relationships form for LOD datasets. Therefore, we constructed a graph of class–class relationships among 43 SPARQL endpoints and analyzed the connectivity of the graph. As a result, we found that (1) the sizes of connected components follow a power law; thus we should deal with the classes separately according to the size of connected components, (2) only the largest and second largest connected components have paths among classes from two or more SPARQL endpoints, and the datasets of each of the two connected components share ontologies, and (3) key classes that connect SPARQL endpoints are primarily upper-level concepts in the biological domain.