Understanding Trolls with Efficient Analytics of Large Graphs in Neo4j

Analytics of large graph data set has become an important means of understanding and influencing the world. The use of graph database technology in the International Consortium of Investigative Journalists’ (ICIJ) investigation of the Panama Papers and Paradise Papers or in cancer research illustrates how analysing graph-structured data helps to uncover important but hidden relationships. A very current example in that regards shows how graph analytics can help shed light on the operations of social media troll-networks, e.g. on Twitter. In similar fashion, graph analytics can help enterprises to unearth hidden patterns and structures within connected data, to make more accurate predictions and faster decisions. All this requires efficient graph analytics well-integrated with management of graph data. The Neo4j Graph Platform provides such an environment. It provides transactional processing and analytical processing of graph data including data management and analytics tooling. A central element for graph analytics in the Graph Platform are the Neo4j graph algorithms. Neo4j graph algorithms provide efficiently implemented, parallel versions of common graph algorithms, integrated and optimized for the Neo4j transactional database. In this paper, we will describe the design and integration Neo4j Graph Algorithms, demonstrate its utility of our approach with a Twitter Troll analysis, and show case its performance with a few experiments on large graphs.

[1]  Ulrik Brandes,et al.  Centrality Estimation in Large Networks , 2007, Int. J. Bifurc. Chaos.

[2]  Kurt Mehlhorn,et al.  Engineering DFS-Based Graph Algorithms , 2017, ArXiv.

[3]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[4]  David A. Bader,et al.  An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances , 2007, ALENEX.

[5]  Mustaque Ahamad,et al.  Slow memory: weakening consistency to enhance concurrency in distributed shared memories , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[6]  Viktor K. Prasanna,et al.  Fast parallel algorithm for unfolding of communities in large graphs , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  J. Y. Yen An algorithm for finding shortest routes from all source nodes to a given destination in general networks , 1970 .

[8]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[9]  Vipin Kumar,et al.  Scalability of Parallel Algorithms for the All-Pairs Shortest-Path Problem , 1991, J. Parallel Distributed Comput..

[10]  Hannes Voigt,et al.  Declarative Multidimensional Graph Queries , 2016, eBISS.

[11]  Marko A. Rodriguez,et al.  Constructions from Dots and Lines , 2010, ArXiv.

[12]  R. Prim Shortest connection networks and some generalizations , 1957 .

[13]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[14]  Adriana Iamnitchi,et al.  Identifying high betweenness centrality nodes in large social networks , 2012, Social Network Analysis and Mining.

[15]  Marko A. Rodriguez,et al.  The Gremlin Graph Traversal Machine and Language , 2015, ArXiv.

[16]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Alin Deutsch Querying Graph Databases with the GSQL Query Language , 2018, SBBD.

[18]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[19]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[20]  Romans Kasperovics,et al.  GraphScript: implementing complex graph algorithms in SAP HANA , 2017, DBPL.

[21]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[22]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[23]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[24]  Arnon Rungsawang,et al.  An efficient partition-based parallel PageRank algorithm , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[25]  David F. Gleich,et al.  Fast Parallel PageRank: A Linear System Approach , 2004 .

[26]  Abraham Bernstein,et al.  Signal/Collect: Graph Algorithms for the (Semantic) Web , 2010, SEMWEB.

[27]  Yasuhiro Fujiwara,et al.  Efficient Label Propagation , 2014, ICML.

[28]  Marcus Paradies,et al.  Big Graph Data Analytics on Single Machines – An Overview , 2017, Datenbank-Spektrum.

[29]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[31]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[32]  Jaroslav Nesetril,et al.  Otakar Boruvka on minimum spanning tree problem Translation of both the 1926 papers, comments, history , 2001, Discret. Math..

[33]  Wolfgang Lehner,et al.  The Graph Story of the SAP HANA Database , 2013, BTW.

[34]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX Annual Technical Conference.

[35]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[36]  Alex Bavelas,et al.  Communication Patterns in Task‐Oriented Groups , 1950 .

[37]  Monica S. Lam,et al.  SociaLite: An Efficient Graph Query Language Based on Datalog , 2015, IEEE Transactions on Knowledge and Data Engineering.

[38]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[39]  Peter Sanders,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[40]  Stefan Plantikow,et al.  Graph Data Management Systems , 2019, Encyclopedia of Big Data Technologies.

[41]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[42]  Kunle Olukotun,et al.  Simplifying Scalable Graph Processing with a Domain-Specific Language , 2014, CGO '14.

[43]  F. Heider The psychology of interpersonal relations , 1958 .

[44]  Sungpack Hong,et al.  PGQL: a property graph query language , 2016, GRADES '16.

[45]  V. Latora,et al.  Harmony in the Small-World , 2000, cond-mat/0008357.

[46]  Sivasankaran Rajamanickam,et al.  BFS and Coloring-Based Parallel Algorithms for Strongly Connected Components and Related Problems , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.