Counting to k or how SPARQL1.1 Property Paths Can Be Extended to Top-k Path Queries

While the volume of graph data available on the Web in RDF is steadily growing, SPARQL, as the standard query language for RDF still remains effectively unusable for the basic task of finding paths through the graph between selected nodes. Property Paths, as introduced in SPARQL 1.1 are unfit for this purpose, as they can only be used to test path existence. More expressive features, such as counting distinct paths between two nodes, have been shown highly intractable in the worst case, in particular in graphs with high degree of cyclicity. Still, practical use cases demand a solution for path retrieval even when the total number of paths is prohibitively large. A common approach is to ask not for all, but only for the k shortest paths. In this paper, we extend SPARQL 1.1 property paths in a manner that allows to compute and return the k shortest paths matching a property path expression between two nodes. For RDF graphs in the compact HDT format, we evaluate or algorithm for top k shortest paths showing that a relatively simple approach works (in fact, more efficiently than other, more complex algorithms in the literature) in practical use cases.

[1]  Ruben Verborgh,et al.  Using Triple Pattern Fragments to Enable Streaming of Top-k Shortest Paths via the Web , 2016, SemWebEval@ESWC.

[2]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[3]  Jürgen Umbrich,et al.  On finding the k shortest paths in RDF data , 2016 .

[4]  Jean-François Baget,et al.  Constrained Regular Expressions in SPARQL , 2008, SWWS.

[5]  Robert D. Finn,et al.  The European Bioinformatics Institute in 2016: Data growth and integration , 2015, Nucleic Acids Res..

[6]  Marcelo Arenas,et al.  Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard , 2012, WWW.

[7]  David Eppstein,et al.  Finding the k Shortest Paths , 1999, SIAM J. Comput..

[8]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[9]  Alan R. Moody,et al.  From Big Data to Precision Medicine , 2019, Front. Med..

[10]  Amit P. Sheth,et al.  SPARQ2L: towards support for subgraph extraction queries in rdf databases , 2007, WWW '07.

[11]  Wei Hu,et al.  Link Analysis of Life Science Linked Data , 2015, SEMWEB.

[12]  G. Rajagopal,et al.  The path from big data to precision medicine , 2016 .

[13]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[14]  Andreas Dengel,et al.  Top-k Shortest Paths in Directed Labeled Multigraphs , 2016, SemWebEval@ESWC.

[15]  Jean-François Baget,et al.  Extending SPARQL with regular expression patterns (for querying RDF) , 2009, J. Web Semant..

[16]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[17]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[18]  Thomas Neumann,et al.  Path Query Processing on Very Large RDF Graphs , 2011, WebDB.

[19]  Muhammad Arshad Islam,et al.  Modified MinG Algorithm to Find Top-K Shortest Paths from large RDF Graphs , 2016, SemWebEval@ESWC.

[20]  Phivos Mylonas,et al.  Top-K Shortest Paths in Large Typed RDF Datasets Challenge , 2016, SemWebEval@ESWC.

[21]  Krys J. Kochut,et al.  SPARQLeR: Extended Sparql for Semantic Association Discovery , 2007, ESWC.

[22]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .