Graphing trillions of triangles

The increasing size of Big Data is often heralded but how data are transformed and represented is also profoundly important to knowledge discovery, and this is exemplified in Big Graph analytics. Much attention has been placed on the scale of the input graph but the product of a graph algorithm can be many times larger than the input. This is true for many graph problems, such as listing all triangles in a graph. Enabling scalable graph exploration for Big Graphs requires new approaches to algorithms, architectures, and visual analytics. A brief tutorial is given to aid the argument for thoughtful representation of data in the context of graph analysis. Then a new algebraic method to reduce the arithmetic operations in counting and listing triangles in graphs is introduced. Additionally, a scalable triangle listing algorithm in the MapReduce model will be presented followed by a description of the experiments with that algorithm that led to the current largest and fastest triangle listing benchmarks to date. Finally, a method for identifying triangles in new visual graph exploration technologies is proposed.

[1]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[2]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[3]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[4]  Naili Liu Overview of Big Data , 2015 .

[5]  Paul Burkhardt,et al.  A cloud-based approach to big graphs , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[6]  Eli Upfal,et al.  Space-round tradeoffs for MapReduce computations , 2011, ICS '12.

[7]  Pak Chung Wong,et al.  A multi-level middle-out cross-zooming approach for large graph analytics , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[8]  P. Grindrod Range-dependent random graphs and their application to modeling large small-world Proteome datasets. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[10]  Jean-Pierre Eckmann,et al.  Curvature of co-links uncovers hidden thematic layers in the World Wide Web , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Pak Chung Wong,et al.  A visual analytics paradigm enabling trillion-edge graph exploration , 2015, 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV).

[12]  H. D. Simon,et al.  A spectral algorithm for envelope reduction of sparse matrices , 1993, Supercomputing '93. Proceedings.

[13]  François Le Gall,et al.  Powers of tensors and fast matrix multiplication , 2014, ISSAC.

[14]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[15]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[16]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[17]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[18]  Desmond J. Higham,et al.  Unravelling small world networks , 2003 .

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Luca Becchetti,et al.  Efficient algorithms for large-scale local triangle counting , 2010, TKDD.

[21]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[22]  Qin Zhang,et al.  Sorting, Searching, and Simulation in the MapReduce Framework , 2011, ISAAC.