FedS: Towards Traversing Federated RDF Graphs

Traversing paths within a graph is a well-studied problem and highly intractable especially with large-scale graphs. In case of multiple graphs, the standard practice is to merge distinct graphs in a centralised way to evaluate the existence of paths between given entities (or nodes). In the biomedical domain counting and retrieving the number of paths (or edges) that connect two biological entities is a highly desirable feature expected from graph databases. Therefore, non-standard solutions exist that count and retrieve paths from a single graph database. From the standard perspective, SPARQL 1.1 provides the navigational feature called Property Paths (PP) which is limited only to a single RDF graph where path existence can be evaluated between pair of nodes. In this paper, we propose a federated approach – called FedS – that retrieves paths from multiple RDF triple stores. Our key idea is to partially delegate computational load to a set of federated RDF triple stores in a peer-to-peer manner thus reducing the computational burden on a centralised query processing server. In our preliminary investigation, we evaluate FedS against the state-of-the-art approaches that provide the path counting feature over single RDF graph. We compare FedS against these approaches in terms of performance (overall path retrieval time) and result completeness, i.e., number of paths retrieved.

[1]  Phivos Mylonas,et al.  Top-K Shortest Paths in Large Typed RDF Datasets Challenge , 2016, SemWebEval@ESWC.

[2]  Krys J. Kochut,et al.  SPARQLeR: Extended Sparql for Semantic Association Discovery , 2007, ESWC.

[3]  Walid G. Aref,et al.  A Survey of Shortest-Path Algorithms , 2017, ArXiv.

[4]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[5]  Wei Hu,et al.  Link Analysis of Life Science Linked Data , 2015, SEMWEB.

[6]  Jürgen Umbrich,et al.  Counting to k or how SPARQL1.1 Property Paths Can Be Extended to Top-k Path Queries , 2017, SEMANTICS.

[7]  Olaf Hartig,et al.  SPARQL with property paths on the Web , 2017, Semantic Web.

[8]  Thomas Neumann,et al.  Path Query Processing on Very Large RDF Graphs , 2011, WebDB.

[9]  Richard Simon,et al.  Implementing personalized cancer genomics in clinical trials , 2013, Nature Reviews Drug Discovery.

[10]  Felix Naumann,et al.  Links and Paths through Life Sciences Data Sources , 2004, DILS.

[11]  Georg Lausen,et al.  RDFPath: Path Query Processing on Large RDF Graphs with MapReduce , 2011, ESWC Workshops.

[12]  Gerhard Weikum,et al.  Fast and accurate estimation of shortest paths in large graphs , 2010, CIKM.

[13]  Jürgen Umbrich,et al.  On finding the k shortest paths in RDF data , 2016 .

[14]  Egor V. Kostylev,et al.  SPARQL with Property Paths , 2015, SEMWEB.

[15]  Andrew V. Goldberg,et al.  Point-to-Point Shortest Path Algorithms with Preprocessing , 2007, SOFSEM.

[16]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[17]  Michel Dumontier,et al.  Bio2RDF Release 3: A larger, more connected network of Linked Data for the Life Sciences , 2014, SEMWEB.

[18]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[19]  Dorothea Wagner,et al.  Partitioning graphs to speedup Dijkstra's algorithm , 2007, ACM J. Exp. Algorithmics.