Exploring Parallel Algorithmic Choices for Graph Analytics

Sequential graph algorithms are implemented through ordered execution of tasks to achieve high work efficiency. Exposing parallelism in these ordered workloads tends to be an elusive problem. Strict-ordered parallel implementations find nodes that don’t have read-write dependencies and hence can be executed in parallel. They have the work efficiency of their sequential counter-parts due to strict ordering constraints. Larger amount of parallelism can be achieved at the expense of redundant work. Relax-ordered implementations remove the global order and only impose the local order. They go through multiple iterations and have the property of monotonically increasing or decreasing output values allowing them to converge efficiently. Unordered implementations move one step ahead and remove the local order as well. Due to the absence of the order, a large amount of redundant work is done but at the same time more parallelism is exposed. Different parallel implementations perform optimally for different algorithms. Similarly, as the graph input changes, the optimal parallel version may change. The choice of optimal parallel implementations is strongly correlated with the characteristics of graph benchmark and input. This work proposes an analytical prediction model that chooses the optimal parallel im-

[1]  Keshav Pingali,et al.  Priority Queues Are Not Good Concurrent Priority Schedulers , 2015, Euro-Par.

[2]  Omer Khan,et al.  HeteroMap: A Runtime Performance Predictor for Efficient Processing of Graph Analytics on Heterogeneous Multi-Accelerators , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Eric Fleury,et al.  A unifying model for representing time-varying graphs , 2014, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[5]  V Latora,et al.  Small-world behavior in time-varying graphs. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[7]  Thomas Fahringer,et al.  An automatic input-sensitive approach for heterogeneous task partitioning , 2013, ICS '13.

[8]  V. Calhoun,et al.  The Chronnectome: Time-Varying Connectivity Networks as the Next Frontier in fMRI Data Discovery , 2014, Neuron.

[9]  Omer Khan,et al.  CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores , 2015, 2015 IEEE International Symposium on Workload Characterization.

[10]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[11]  Michael M. Swift,et al.  Rinnegan: Efficient resource use in heterogeneous architectures , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[12]  Zhiguo Gong,et al.  Temporal PageRank on Social Networks , 2015, WISE.

[13]  Christina Delimitrou,et al.  Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.

[14]  Xiaojin Zhu,et al.  Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Reynold Xin,et al.  GraphX: Unifying Data-Parallel and Graph-Parallel Analytics , 2014, ArXiv.

[16]  Gurindar S. Sohi,et al.  Adaptive, efficient, parallel execution of parallel programs , 2014, PLDI.

[17]  Nicola Santoro,et al.  Time-varying graphs and dynamic networks , 2010, Int. J. Parallel Emergent Distributed Syst..

[18]  Shashi Shekhar,et al.  Time-Aggregated Graphs for Modeling Spatio-temporal Networks , 2006, J. Data Semant..

[19]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[20]  Hejun Wu,et al.  Efficient Algorithms for Temporal Path Computation , 2016, IEEE Transactions on Knowledge and Data Engineering.

[21]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[22]  Aristides Gionis,et al.  Temporal PageRank , 2016, ECML/PKDD.

[23]  Ugur Demiryurek,et al.  Latent Space Model for Road Networks to Predict Time-Varying Traffic , 2016, KDD.

[24]  Farnoush Banaei Kashani,et al.  A case for time-dependent shortest path computation in spatial networks , 2010, GIS '10.

[25]  Yi Lu,et al.  Path Problems in Temporal Graphs , 2014, Proc. VLDB Endow..

[26]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[27]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[28]  Andrew V. Goldberg,et al.  Shortest paths algorithms: Theory and experimental evaluation , 1994, SODA '94.

[29]  Kevin Skadron,et al.  Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[30]  Jeffrey Xu Yu,et al.  Finding time-dependent shortest paths over large graphs , 2008, EDBT '08.

[31]  Guoren Wang,et al.  Time-Dependent Graphs: Definitions, Applications, and Algorithms , 2019, Data Science and Engineering.

[32]  Ada Wai-Chee Fu,et al.  Minimum Spanning Trees in Temporal Graphs , 2015, SIGMOD Conference.

[33]  Keshav Pingali,et al.  Kinetic Dependence Graphs , 2015, ASPLOS.

[34]  Saman P. Amarasinghe,et al.  Portable performance on heterogeneous architectures , 2013, ASPLOS '13.

[35]  Kalyan Veeramachaneni,et al.  Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[36]  Daeyoung Kim,et al.  ChronoGraph: Enabling Temporal Graph Traversals for Efficient Information Diffusion Analysis over Time , 2020, IEEE Transactions on Knowledge and Data Engineering.

[37]  John D. Owens,et al.  Gunrock , 2017, ACM Trans. Parallel Comput..

[38]  Duanbing Chen,et al.  A fast algorithm for community detection in temporal network , 2015 .

[39]  Guy E. Blelloch,et al.  Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[40]  Sebastiano Vigna,et al.  The Graph Structure in the Web - Analyzed on Different Aggregation Levels , 2015, J. Web Sci..

[41]  John W. Polak,et al.  Autonomous cars: The tension between occupant experience and intersection capacity , 2015 .

[42]  James Cheng,et al.  Temporal Graph Traversals: Definitions, Algorithms, and Applications , 2014, ArXiv.

[43]  Nicola Bombieri,et al.  An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures , 2016, IEEE Transactions on Parallel and Distributed Systems.

[44]  Andrew V. Goldberg,et al.  The Shortest Path Problem , 2009 .

[45]  Omer Khan,et al.  Efficient Situational Scheduling of Graph Workloads on Single-Chip Multicores and GPUs , 2017, IEEE Micro.

[46]  Nicola Santoro,et al.  Time-Varying Graphs and Social Network Analysis: Temporal Indicators and Metrics , 2011, ArXiv.

[47]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[48]  Benjamin Hindman,et al.  Composing parallel software efficiently with lithe , 2010, PLDI '10.

[49]  Binoy Ravindran,et al.  On Distributed Time-Dependent Shortest Paths over Duty-Cycled Wireless Sensor Networks , 2010, 2010 Proceedings IEEE INFOCOM.

[50]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[51]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[52]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[53]  Cecilia Mascolo,et al.  Components in time-varying graphs , 2011, Chaos.

[54]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[55]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[56]  David A. Patterson,et al.  Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.