Distributed computing of distance‐based graph invariants for analysis and visualization of complex networks

We present a new framework for analysis and visualization of complex networks based on structural information retrieved from their distance k‐graphs and B‐matrices. The construction of B‐matrices for graphs with more than 1 million edges requires massive Breadth‐First Search (BFS) computations and is facilitated using new software prepared for distributed environments. Our framework benefits from data parallelism inherent to all‐pair shortest‐path problem and extends Cassovary, an open‐source in‐memory graph processing engine, to enable multinode computation of distance k‐graphs and related graph descriptors. We also introduce a new type of B‐matrix, constructed using clustering coefficient vertex invariant, which can be generated with a computational effort comparable with the one required for a previously known degree B‐matrix, while delivering an additional set of information about graph structure. Our approach enables efficient generation of expressive, multidimensional descriptors useful in graph embedding and graph mining tasks. The experiments showed that the new framework is scalable and for specific all‐pair shortest‐path task provides better performance than existing generic graph processing frameworks. We further present how the developed tools helped in the analysis and visualization of real‐world graphs from Stanford Large Network Dataset Collection. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Alexander Tiskin,et al.  All-Pairs Shortest Paths Computation in the BSP Model , 2001, ICALP.

[2]  Mario Vento,et al.  Graph Matching and Learning in Pattern Recognition in the Last 10 Years , 2014, Int. J. Pattern Recognit. Artif. Intell..

[3]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[4]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[5]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[6]  Edwin R. Hancock,et al.  A generative model for graph matching and embedding , 2009, Comput. Vis. Image Underst..

[7]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[8]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[9]  Edwin R. Hancock,et al.  Graph matching using the interference of continuous-time quantum walks , 2009, Pattern Recognit..

[10]  Wan-Jui Lee,et al.  A Labelled Graph Based Multiple Classifier System , 2009, MCS.

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  Edwin R. Hancock,et al.  Graph Characterization from Entropy Component Analysis , 2014, 2014 22nd International Conference on Pattern Recognition.

[13]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[14]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[15]  Shilpa Chakravartula,et al.  Complex Networks: Structure and Dynamics , 2014 .

[16]  Witold Dzwinel,et al.  Very Fast Interactive Visualization of Large Sets of High-dimensional Data , 2015, ICCS.

[17]  Witold Dzwinel,et al.  Exploring Complex Networks with Graph Investigator Research Application , 2011, Comput. Informatics.

[18]  J. van Leeuwen,et al.  Graph Based Representations in Pattern Recognition , 2003, Lecture Notes in Computer Science.

[19]  Peter Boncz,et al.  First International Workshop on Graph Data Management Experiences and Systems , 2013, SIGMOD 2013.

[20]  Edwin R. Hancock,et al.  Clustering and Embedding Using Commute Times , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Kaspar Riesen,et al.  Recent advances in graph-based pattern recognition with applications in document analysis , 2011, Pattern Recognit..

[22]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[23]  Yi Lu,et al.  Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation , 2014, Proc. VLDB Endow..

[24]  Ulrik Brandes,et al.  Studying Social Networks - A Guide to Empirical Research , 2013 .

[25]  David A. Yuen,et al.  Efficient Graph Comparison and Visualization Using GPU , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.

[26]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[27]  Kaspar Riesen,et al.  IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning , 2008, SSPR/SPR.

[28]  Massimo Piccardi,et al.  Discriminative prototype selection methods for graph embedding , 2013, Pattern Recognit..

[29]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[30]  Ernest Valveny,et al.  Dimensionality Reduction for Graph of Words Embedding , 2011, GbRPR.

[31]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[32]  Witold Dzwinel,et al.  ivga: A fast force-directed method for interactive visualization of complex networks , 2017, J. Comput. Sci..

[33]  Wojciech W. Czech Graph Descriptors from B-Matrix Representation , 2011, GbRPR.

[34]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[35]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[36]  Yanfeng Zhang,et al.  PrIter: A Distributed Framework for Prioritizing Iterative Computations , 2011, IEEE Transactions on Parallel and Distributed Systems.

[37]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[38]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[39]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[40]  Witold Dzwinel,et al.  Comparison of Large Graphs Using Distance Information , 2015, PPAM.

[41]  Wojciech Czech,et al.  Invariants of distance k-graphs for graph embedding , 2012, Pattern Recognit. Lett..

[42]  Alexandru Nicolau,et al.  R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks , 2007, Algorithmica.

[43]  Kaspar Riesen,et al.  Towards the unification of structural and statistical pattern recognition , 2012, Pattern Recognit. Lett..

[44]  Edwin R. Hancock,et al.  Graph Characterization Using Wave Kernel Trace , 2014, 2014 22nd International Conference on Pattern Recognition.