Design, Generation, and Validation of Extreme Scale Power-Law Graphs

Massive power-law graphs drive many fields: metagenomics, brain mapping, Internet-of-things, cybersecurity, and sparse machine learning. The development of novel algorithms and systems to process these data requires the design, generation, and validation of enormous graphs with exactly known properties. Such graphs accelerate the proper testing of new algorithms and systems and are a prerequisite for success on real applications. Many random graph generators currently exist that require realizing a graph in order to know its exact properties: number of vertices, number of edges, degree distribution, and number of triangles. Designing graphs using these random graph generators is a time-consuming trial-and-error process. This paper presents a novel approach that uses Kronecker products to allow the exact computation of graph properties prior to graph generation. In addition, when a real graph is desired, it can be generated quickly in memory on a parallel computer with no-interprocessor communication. To test this approach, graphs with 1012 edges are generated on a 40,000+ core supercomputer in 1 second and exactly agree with those predicted by the theory. In addition, to demonstrate the extensibility of this approach, decetta-scale graphs with up to 10^30 edges are simulated in a few minutes on a laptop.

[1]  H. Howie Huang,et al.  Scalable stochastic block partition , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[2]  Vilfredo Pareto,et al.  Manuale di economia politica , 1965 .

[3]  Jeremy Kepner,et al.  Constructing Adjacency Arrays from Incidence Arrays , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[4]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[5]  Dong Yu,et al.  Exploiting sparseness in deep neural networks for large vocabulary speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jeremy Kepner,et al.  PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[8]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[9]  Jeremy Kepner,et al.  Using a Power Law distribution to describe big data , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[10]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[11]  Keshav Pingali,et al.  Parallel triangle counting and k-truss identification using graph-centric methods , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[12]  Jeremy Kepner,et al.  Genetic sequence matching using D4M big data approaches , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[13]  Dylan Hutchison Distributed triangle counting in the Graphulo matrix math library , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[14]  Jeremy Kepner,et al.  Rapid sequence identification of potential pathogens using techniques from sparse linear algebra , 2015, 2015 IEEE International Symposium on Technologies for Homeland Security (HST).

[15]  Jeremy Kepner,et al.  Dynamic distributed dimensional data model (D4M) database and computation system , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  D. Polychronopoulos,et al.  Conserved Noncoding Elements Follow Power-Law-Like Distributions in Several Genomes as a Result of Genome Dynamics , 2014, PloS one.

[17]  Jeremy Kepner,et al.  'pMATLAB Parallel MATLAB Library' , 2007, Int. J. High Perform. Comput. Appl..

[18]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[19]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[20]  Antonino Tumeo,et al.  Exploring DataVortex Systems for Irregular Applications , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[22]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[23]  Kensuke Fukuda,et al.  Scaling in Internet Traffic: A 14 Year and 3 Day Longitudinal Study, With Multiscale Analyses and Random Projections , 2017, IEEE/ACM Transactions on Networking.

[24]  George Karypis,et al.  Truss decomposition on shared-memory parallel systems , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[25]  Terri Pedersen Summey,et al.  If You Build It, Will They Come? , 2004 .

[26]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[27]  Humayun Kabir,et al.  Parallel k-truss decomposition on multicore systems , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[28]  Aurora E. Clark,et al.  MoleculaRnetworks: An integrated graph theoretic and data mining tool to explore solvent organization in molecular simulation , 2012, J. Comput. Chem..

[29]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[30]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[31]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[32]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[33]  C. Loan The ubiquitous Kronecker product , 2000 .

[34]  Viktor K. Prasanna,et al.  Design and implementation of parallel PageRank on multicore platforms , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[35]  Paul Burkhardt,et al.  Graphing trillions of triangles , 2016, Inf. Vis..

[36]  Thomas M. Conte,et al.  Superstrider associative array architecture: Approved for unlimited unclassified release: SAND2017-7089 C , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[37]  Van Emden Henson,et al.  An ensemble framework for detecting community changes in dynamic networks , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[38]  Alex Fornito,et al.  Graph Theoretic Analysis of Human Brain Networks , 2016 .

[39]  Roger Pearce Triangle counting for scale-free graphs at scale in distributed memory , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[40]  Harry Eugene Stanley,et al.  Percolation of localized attack on complex networks , 2014, ArXiv.

[41]  Scott McMillan,et al.  Design of the GraphBLAS API for C , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[42]  David A. Bader Designing Scalable Synthetic Compact Applications for Benchmarking High Productivity Computing Systems , 2006 .

[43]  Mauro Bisson,et al.  Static graph challenge on GPU , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[44]  David A. Bader,et al.  Graphs, Matrices, and the GraphBLAS: Seven Good Reasons , 2015, ICCS.

[45]  Song Guo,et al.  Malware Propagation in Large-Scale Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[46]  M. Furusawa,et al.  Distribution of human single‐nucleotide polymorphisms is approximated by the power law and represents a fractal structure , 2016, Genes to cells : devoted to molecular & cellular mechanisms.

[47]  Jeremy Kepner,et al.  A scalable signal processing architecture for massive graph analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Jeremy Kepner,et al.  Novel graph processor architecture, prototype system, and results , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[49]  Michel Minoux,et al.  Dioïds and semirings: Links to fuzzy sets and other applications , 2007, Fuzzy Sets Syst..

[50]  Jeremy Kepner Parallel MATLAB - for Multicore and Multinode Computers , 2009, Software, environments, tools.

[51]  John R. Gilbert,et al.  Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[52]  J. Slotine,et al.  Spectrum of controlling and observing complex networks , 2015, Nature Physics.

[53]  David F. Gleich,et al.  PageRank beyond the Web , 2014, SIAM Rev..

[54]  Michael Stonebraker,et al.  Standards for graph algorithm primitives , 2014, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[55]  William Song,et al.  Streaming graph challenge: Stochastic block partition , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[56]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[57]  Shahir Mowlaei,et al.  Triangle counting via vectorized set intersection , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[58]  H. Howie Huang,et al.  TriX: Triangle counting at extreme scale , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[59]  Viktor K. Prasanna,et al.  Quickly finding a truss in a haystack , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[60]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[61]  Peter M. Kogge,et al.  Graph Analytics: Complexity, Scalability, and Architectures , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[62]  José E. Moreira,et al.  Enabling massive deep neural networks with the GraphBLAS , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[63]  Stijn Eyerman,et al.  Exploring optimizations on shared-memory platforms for parallel triangle counting algorithms , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[64]  Antonino Tumeo,et al.  Scalable static and dynamic community detection using Grappolo , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[65]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[66]  Jinjun Xiong,et al.  Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[67]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[68]  Andrew Knyazev,et al.  Preconditioned spectral clustering for stochastic block partition streaming graph challenge (Preliminary version at arXiv.) , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[69]  Doru-Thom Popovici,et al.  First look: Linear algebra-based triangle counting without matrix multiplication , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[70]  Franz Franchetti,et al.  Mathematical foundations of the GraphBLAS , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[71]  William Song,et al.  Static graph challenge: Subgraph isomorphism , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[72]  P. M. Weichsel THE KRONECKER PRODUCT OF GRAPHS , 1962 .