Graph-XLL: a Graph Library for Extra Large Graph Analytics on a Single Machine

Graph libraries containing already implemented algorithms are highly desired since users can conveniently use the algorithms off-the-shelf to achieve fast analytics and prototyping, rather than implementing the algorithms with lower-level APIs. Besides the ease of use, the ability to efficiently process extra large graphs is also required by users. The popular existing graph libraries include the igraph R library and the NetworkX Python library. Although these libraries provide many off-the-shelf algorithms for users, the in-memory graph representation limits their scalability for computing on large graphs. Therefore, in this paper, we introduce Graph-XLL: a graph library implemented using the WebGraph framework in a vertex-centric manner, with much less memory requirement compared to igraph and NetworkX. Scalable analytics for extra large graphs (up to tens of millions of vertices and billions of edges) can be achieved on a single consumer grade machine within a reasonable amount of time. Such computation would cause out-of-memory error if using igraph or NetworkX.

[1]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[2]  David A. Bader,et al.  Approximating Betweenness Centrality , 2007, WAW.

[3]  Alex Thomo,et al.  Data Structures for Efficient Computation of Influence Maximization and Influence Estimation , 2018, EDBT.

[4]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[5]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[6]  M. Barthelemy Betweenness centrality in large complex networks , 2003, cond-mat/0309436.

[7]  Amine Mhedhbi,et al.  The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[8]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[9]  Alex Thomo,et al.  Triad Enumeration at Trillion-Scale Using a Single Commodity Machine , 2019, EDBT.

[10]  Phillip Bonacich,et al.  Some unique properties of eigenvector centrality , 2007, Soc. Networks.

[11]  Jeffrey Xu Yu,et al.  Influential Community Search in Large Networks , 2015, Proc. VLDB Endow..

[12]  Ken-ichi Kawarabayashi,et al.  NoSingles: a space-efficient algorithm for influence maximization , 2018, SSDBM.

[13]  Alex Thomo,et al.  Fast Truss Decomposition in Large-scale Probabilistic Graphs , 2019, EDBT.

[14]  Alex Thomo,et al.  Efficient Computation of Importance Based Communities in Web-Scale Networks Using a Single Machine , 2016, CIKM.

[15]  Alex Thomo,et al.  Efficient Computation of Feedback Arc Set at Web-Scale , 2016, Proc. VLDB Endow..

[16]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[17]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[18]  Alex Thomo,et al.  Efficient Computation of Probabilistic Core Decomposition at Web-Scale , 2019, EDBT.

[19]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[20]  Madhav V. Marathe,et al.  Computational epidemiology , 2013, CACM.

[21]  Jon M Kleinberg,et al.  Hubs, authorities, and communities , 1999, CSUR.

[22]  O. Sporns,et al.  Complex brain networks: graph theoretical analysis of structural and functional systems , 2009, Nature Reviews Neuroscience.

[23]  Shiyu Ji,et al.  Refining Approximating Betweenness Centrality Based on Samplings , 2016, ArXiv.

[24]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[25]  Vladimir Batagelj,et al.  A subquadratic triad census algorithm for large sparse networks with small maximum degree , 2001, Soc. Networks.

[26]  Michael Simpson,et al.  Clearing Contamination in Large Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[27]  Erik Carlsson,et al.  Fast Routing in Very Large Public Transportation Networks Using Transfer Patterns , 2010, ESA.

[28]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[29]  Anders Yeo,et al.  The Minimum Feedback Arc Set Problem is NP-Hard for Tournaments , 2006, Combinatorics, Probability and Computing.

[30]  Alex Thomo,et al.  K-Truss Decomposition of Large Networks on a Single Consumer-Grade Machine , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[31]  Alex Thomo,et al.  An experimental evaluation of giraph and GraphChi , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[32]  Alessandro Vespignani,et al.  Large scale networks fingerprinting and visualization using the k-core decomposition , 2005, NIPS.