Extending SPARQL with graph functions

Much of the early domain-specific success with graph analytics has been with algorithms whose results are based on global graph structure. An example of such an algorithm is betweenness centrality, whose value for any vertex potentially depends on the number of shortest paths between all pairs of vertices in the entire graph. YarcData's UrikaTM customers use SPARQL's graph-oriented pattern-matching capabilities, but many of them also require a capability to call graph functions such as betweenness centrality. This customer feedback led us to combine SPARQL 1.1's query capabilities with classical and emerging graph-analytic algorithms (e.g., community detection, shortest path, betweenness, BadRank). With this capability, a SPARQL query can select a specific subgraph of interest, pass that subgraph to a graph algorithms for deep analysis, and then pass those results back to an enclosing SPARQL query that post-processes those results as needed. With the Summer 2014 Urika release, we have extended the SPARQL implementation with a graph-function capability and a small set of built-in graph functions. We describe our design approach and our experiences with this first release, including anecdotal evidence of dramatically higher performance. Built-in graph functions represent an important step in the maturation of graph analysis and SPARQL. As common motifs emerge from use cases, those motifs may be mapped to specific graph functions that can be highly tuned for much higher performance than will be possible for SPARQL. Identifying those motifs and developing the underlying graph functions to accelerate their execution is a topic of intense effort industry-wide. Graph functions merged with SPARQL provide a new mechanism by which third-party graph-algorithm developers may expose their algorithms to widespread use.

[1]  Michael Stonebraker,et al.  Standards for graph algorithm primitives , 2014, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[2]  Tamara G. Kolda,et al.  Generalized BadRank with Graduated Trust , 2009 .

[3]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[4]  A. Kopser,et al.  Overview of the Next Generation Cray XMT , 2011 .

[5]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[6]  Roy T. Fielding,et al.  Uniform Resource Identifier (URI): Generic Syntax , 2005, RFC.

[7]  Peter Sanders,et al.  Polynomial time algorithms for multicast network code construction , 2005, IEEE Transactions on Information Theory.

[8]  Brian W. Barrett,et al.  Implementing a portable Multi-threaded Graph Library: The MTGL on Qthreads , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  P. Metzger,et al.  Network Working Group , 2000 .

[12]  David A. Bader,et al.  Parallel Community Detection for Massive Graphs , 2011, PPAM.

[13]  Agnieszka Ławrynowicz Query Results Clustering by Extending SPARQL with CLUSTER BY , 2009 .

[14]  Brian D. Davison,et al.  Propagating Trust and Distrust to Demote Web Spam , 2006, MTW.

[15]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[16]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[17]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[18]  John R. Gilbert,et al.  Implementing Iterative Algorithms with SPARQL , 2014, EDBT/ICDT Workshops.

[19]  Leyla Bilge,et al.  Disclosure: detecting botnet command and control servers through large-scale NetFlow analysis , 2012, ACSAC '12.

[20]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[21]  J. Govil,et al.  Criminology of BotNets and their detection and defense methods , 2007, 2007 IEEE International Conference on Electro/Information Technology.

[22]  Hermannus Balsters,et al.  On the Move to Meaningful Internet Systems: OTM 2019 Workshops: Confederated International Workshops: EI2N, FBM, ICSP, Meta4eS and SIAnA 2019, Rhodes, Greece, October 21–25, 2019, Revised Selected Papers , 2020, OTM Workshops.

[23]  Jack Minker,et al.  On Indefinite Databases and the Closed World Assumption , 1987, CADE.

[24]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[25]  Olivier Corby,et al.  Semantic Social Network Analysis , 2009, ArXiv.

[26]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[27]  Luciano Baresi,et al.  Toward Open-World Software: Issue and Challenges , 2006, Computer.