论文信息 - Using RAPIDS AI to Accelerate Graph Data Science Workflows

Using RAPIDS AI to Accelerate Graph Data Science Workflows

Scale free networks are abundant in many natural, social, and engineering phenomena for which there exists a substantial corpus of theory able to elucidate many of their underlying properties. In this paper we study the scalability of some widely available Python-based tools for the empirical investigation of scale free network data in a typical early stage analysis pipeline. We demonstrate how porting serial implementations of commonly used pipeline data structures and methods to parallel hardware via the NVIDIA RAPIDS AI API requires minimal rewriting of code. As a utility for each pipeline we recorded the time required to complete the analysis for both the serial and parallelized workflows on a task-wise basis. Furthermore, we review a statistically based methodology for fitting a power-law to empirical data. Maximum likelihood estimations for scale were inferred after using Kolmogorov-Smirnov based methods to determine location estimates. Our serial implementation of a typical early stage network analysis workflow uses a combination of widely used data structures and algorithms provided by the NumPy, Pandas and NetworkX frameworks. We then parallelized our workflow using the APIs provided by NVIDIA's RAPIDS AI open data science libraries and measured the relative time to completion for the tasks of ingesting raw data, creating a graph representation of the data and finally fitting a power-law distribution to the empirical observations. The results of our experiments, run on graphs ranging in size from 1 million to 20 million edges, demonstrate that significantly less time is required to complete the tasks of generating a graph from an edge list, computing the degree of all nodes in the graph and fitting the scale and location parameters to the observed data.

[1] Neil Zhenqiang Gong,et al. Reciprocal versus parasocial relationships in online social networks , 2013, Social Network Analysis and Mining.

[2] Michalis Faloutsos,et al. On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[3] M. Coffey. On some series representations of the Hurwitz zeta function , 2008 .

[4] A. Clauset,et al. On the Frequency of Severe Terrorist Events , 2006, physics/0606007.

[5] Mark E. J. Newman,et al. Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[6] Lada A. Adamic,et al. Power-Law Distribution of the World Wide Web , 2000, Science.

[7] Michel L. Goldstein,et al. Problems with fitting to the power-law distribution , 2004, cond-mat/0402322.

[8] Linas Vepstas. An efficient algorithm for accelerating the convergence of oscillatory series, useful for computing the polylogarithm and Hurwitz zeta functions , 2007, Numerical Algorithms.

[9] H. Bauke. Parameter estimation for power-law distributions by maximum likelihood methods , 2007, 0704.1867.

[10] V. Menon,et al. Musical rhythm spectra from Bach to Joplin obey a 1/f power law , 2012, Proceedings of the National Academy of Sciences.