SNAP

Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task. Here, we describe Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy to use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs where nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that can efficiently manipulate large graphs, calculate structural properties, generate regular and random graphs, and handle attributes and meta-data on nodes and edges. Besides being able to handle large graphs, an additional strength of SNAP is that networks and their attributes are fully dynamic, they can be modified during the computation at low cost. SNAP is provided as an open source library in C++ as well as a module in Python. We also describe the Stanford Large Network Dataset, a set of social and information real-world networks and datasets, which we make publicly available. The collection is a complementary resource to our SNAP software and is widely used for development and benchmarking of graph analytics algorithms.

[1]  E. Cantoni Analysis of Robust Quasi-deviances for Generalized Linear Models , 2004 .

[2]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[3]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[6]  Jure Leskovec,et al.  Discovering social circles in ego networks , 2012, ACM Trans. Knowl. Discov. Data.

[7]  Bernhard Schölkopf,et al.  Structure and dynamics of information pathways in online media , 2012, WSDM.

[8]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[9]  Ashish Goel,et al.  Personalized PageRank Estimation and Search: A Bidirectional Approach , 2015, WSDM.

[10]  Ashish Goel,et al.  FAST-PPR: scaling personalized pagerank estimation for large graphs , 2014, KDD.

[11]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[13]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[14]  Padhraic Smyth,et al.  Analysis and Visualization of Network Data using JUNG , 2005 .

[15]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[16]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[17]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[18]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[19]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[20]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[21]  András A. Benczúr,et al.  SpamRank -- Fully Automatic Link Spam Detection , 2005, AIRWeb.

[22]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[23]  Pararth Shah,et al.  Ringo: Interactive Graph Analytics on Big-Memory Machines , 2015, SIGMOD Conference.

[24]  Jure Leskovec,et al.  Nonparametric Multi-group Membership Model for Dynamic Networks , 2013, NIPS.

[25]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[26]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[27]  Jure Leskovec,et al.  Detecting cohesive and 2-mode communities indirected and undirected networks , 2014, WSDM.

[28]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[29]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[30]  Vladimir Batagelj,et al.  Generalized Cores , 2002, ArXiv.

[31]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[32]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[33]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[34]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[35]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[36]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[37]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[38]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[39]  Rok Sosic,et al.  NIFTY: a system for large scale information flow tracking and clustering , 2013, WWW.

[40]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[41]  Jure Leskovec,et al.  Geospatial Structure of a Planetary-Scale Social Network , 2014, IEEE Transactions on Computational Social Systems.

[42]  Jure Leskovec,et al.  Latent Multi-group Membership Graph Model , 2012, ICML.

[43]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[44]  Béla Bollobás,et al.  A Probabilistic Proof of an Asymptotic Formula for the Number of Labelled Regular Graphs , 1980, Eur. J. Comb..

[45]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[46]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[47]  Bernhard Schölkopf,et al.  Uncovering the structure and temporal dynamics of information propagation , 2014, Network Science.

[48]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[49]  Jure Leskovec,et al.  Overlapping Communities Explain Core–Periphery Organization of Networks , 2014, Proceedings of the IEEE.

[50]  Jure Leskovec,et al.  Multiplicative Attribute Graph Model of Real-World Networks , 2010, Internet Math..

[51]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[52]  Jure Leskovec,et al.  Modeling Social Networks with Node Attributes using the Multiplicative Attribute Graph Model , 2011, UAI.

[53]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[54]  Alan M. Frieze,et al.  A Geometric Preferential Attachment Model of Networks , 2006, Internet Math..

[55]  Jinha Kim,et al.  OPT: a new framework for overlapped and parallel triangulation in large-scale graphs , 2014, SIGMOD Conference.

[56]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.