Computing source-to-target shortest paths for complex networks in RDBMS

Abstract How do we deal with the exponential growth of complex networks? Are existing algorithms introduced decades ago able to work on big network graphs? In this work, we focus on computing shortest paths (SP) from a source to a target in large network graphs. Main memory algorithms require the graph to fit in memory and they falter when this requirement is not met. We explore SQL-based solutions using a Relational Database Management System (RDBMS). Our approach leverages the intelligent scheduling that a RDBMS performs when executing set-at-a-time expansions of graph vertices, which is in contrast to vertex-at-a-time expansions in classical SP algorithms. Our algorithms perform orders of magnitude faster than baselines and even faster than main memory algorithms for large graphs. Also, we show that our algorithms on RDBMS outperform counterparts running on modern native graph databases, such as Neo4j.

[1]  Marlon Dumas,et al.  Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs , 2011, CIKM '11.

[2]  Alex Thomo,et al.  Query Answering and Containment for Regular Path Queries under Distortions , 2004, FoIKS.

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[5]  Yang Xiang,et al.  A highway-centric labeling approach for answering distance queries on large sparse graphs , 2012, SIGMOD Conference.

[6]  Matthew Richardson,et al.  Yes, there is a correlation: - from social networks to personal behavior on the web , 2008, WWW.

[7]  Jeffrey Xu Yu,et al.  Shortest Path Computing in Relational DBMSs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[8]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[9]  Huajun Chen,et al.  RDF/RDFS-based Relational Database Integration , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[11]  William W. Wadge,et al.  Preferentially Annotated Regular Path Queries , 2007, ICDT.

[12]  Andrey Gubichev,et al.  Graph Pattern Matching: Do We Have to Reinvent the Wheel? , 2014, GRADES.

[13]  Zhe Wu,et al.  Graph analysis: do we have to reinvent the wheel? , 2013, GRADES.

[14]  Hong Cheng,et al.  Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[15]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[16]  Alex Thomo,et al.  Distributed evaluation of generalized path queries , 2005, SAC '05.

[17]  Gerhard Weikum,et al.  Fast and accurate estimation of shortest paths in large graphs , 2010, CIKM.

[18]  Daniela Florescu,et al.  XML and relational database management systems: the inside story , 2005, SIGMOD '05.

[19]  R. Prim Shortest connection networks and some generalizations , 1957 .

[20]  Sreenivas Gollapudi,et al.  A sketch-based distance oracle for web-scale graphs , 2010, WSDM '10.

[21]  Gayatri Swamynathan,et al.  Do social networks improve e-commerce?: a study on social marketplaces , 2008, WOSN '08.

[22]  Anwar M. Ghuloum,et al.  ViewpointFace the inevitable, embrace parallelism , 2009, CACM.

[23]  Christopher Ré,et al.  Incrementally Maintaining Classification using an RDBMS , 2011, Proc. VLDB Endow..

[24]  Michael Stonebraker,et al.  VERTEXICA: Your Relational Friend for Graph Analytics! , 2014, Proc. VLDB Endow..

[25]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[26]  Aristides Gionis,et al.  Fast shortest path distance estimation in large networks , 2009, CIKM.

[27]  Karrie Karahalios,et al.  People Search within an Online Social Network: Large Scale Analysis of Facebook Graph Search Query Logs , 2014, CIKM.

[28]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..

[29]  Danai Koutra,et al.  Linearized and Single-Pass Belief Propagation , 2014, Proc. VLDB Endow..

[30]  Kai-Uwe Sattler,et al.  SQL based frequent pattern mining without candidate generation , 2004, SAC '04.

[31]  Cheng-Te Li,et al.  I See You: Person-of-Interest Search in Social Networks , 2015, SIGIR.

[32]  Alex Thomo,et al.  Regular path queries under approximate semantics , 2006, Annals of Mathematics and Artificial Intelligence.

[33]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[34]  Sherif Sakr,et al.  Efficient Relational Techniques for Processing Graph Queries , 2010, Journal of Computer Science and Technology.

[35]  Alex Thomo,et al.  Enhanced Regular Path Queries on Semistructured Databases , 2006, EDBT Workshops.

[36]  Thomas Willhalm,et al.  Combining Speed-Up Techniques for Shortest-Path Computations , 2004, WEA.

[37]  Nigel Shadbolt,et al.  SPARQL Query Processing with Conventional Relational Database Systems , 2005, WISE Workshops.

[38]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[39]  J. Hopcroft,et al.  Efficient algorithms for graph manipulation , 1971 .

[40]  Ben Y. Zhao,et al.  Shortest paths in less than a millisecond , 2012, WOSN '12.

[41]  Alex Thomo,et al.  Fault-tolerant computation of distributed regular path queries , 2009, Theor. Comput. Sci..

[42]  Karrie Karahalios,et al.  The role of network distance in linkedin people search , 2014, SIGIR.

[43]  Christian Sommer,et al.  Shortest-path queries in static networks , 2014, ACM Comput. Surv..

[44]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[45]  Jignesh M. Patel,et al.  The Case Against Specialized Graph Analytics Engines , 2015, CIDR.

[46]  Philip A. Bernstein,et al.  Mapping XML to a Wide Sparse Table , 2014, IEEE Transactions on Knowledge and Data Engineering.