An Efficient GPU Implementation of Inclusion-Based Pointer Analysis

We present an efficient GPU implementation of Andersen's whole-program inclusion-based pointer analysis, a fundamental analysis on which many others are based, including optimising compilers, bug detection and security analyses. Andersen's algorithm makes extensive modifications to the graph that represents the pointer-manipulating statements in a program. These modifications are highly irregular, input-dependent and statically unpredictable, making it much more challenging to balance such graph workloads across a multitude of GPU cores than those dealt with by traditional graph algorithms such as DFS and BFS. To parallelise Andersen's analysis efficiently on GPUs, we introduce an imbalance-aware workload partitioning scheme that divides its workload dynamically among the concurrent warps, initially in a warp-centric manner (during the coarsegrain stage) but later switches to a task-pool-based model when a workload imbalance is detected (during the fine-grain stage). We improve further its performance by using an adaptive group propagation scheme to reduce some redundant traversals. For a set of 14 C benchmarks evaluated, our parallel implementation of Andersen's analysis achieves a significant speedup of 46 percent on average over the state-of-the art on an NVIDIA Tesla K20c GPU.

[1]  David E. Evans,et al.  Static detection of dynamic memory errors , 1996, PLDI '96.

[2]  Calvin Lin,et al.  Efficient and extensible security enforcement using dynamic data flow analysis , 2008, CCS.

[3]  Hongtao Yu,et al.  Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code , 2010, CGO '10.

[4]  Alexander Aiken,et al.  Partial online cycle elimination in inclusion constraint graphs , 1998, PLDI.

[5]  Manu Sridharan,et al.  Refinement-based context-sensitive points-to analysis for Java , 2006, PLDI '06.

[6]  Ondrej Lhoták,et al.  Points-to analysis using BDDs , 2003, PLDI '03.

[7]  Marion Kee,et al.  Analysis , 2004, Machine Translation.

[8]  Jingling Xue,et al.  Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems , 2013, 20th Annual International Conference on High Performance Computing.

[9]  Rupesh Nasre,et al.  Time- and space-efficient flow-sensitive points-to analysis , 2013, ACM Trans. Archit. Code Optim..

[10]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[11]  Helmut Seidl,et al.  Propagating Differences: An Efficient New Fixpoint Algorithm for Distributive Constraint Systems , 1998, Nord. J. Comput..

[12]  Welf Löwe,et al.  Parallel points-to analysis for multi-core machines , 2011, HiPEAC.

[13]  Ben Hardekopf,et al.  The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code , 2007, PLDI '07.

[14]  Keshav Pingali,et al.  A GPU implementation of inclusion-based points-to analysis , 2012, PPoPP '12.

[15]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[16]  Jingling Xue,et al.  Parallel Pointer Analysis with CFL-Reachability , 2014, 2014 43rd International Conference on Parallel Processing.

[17]  Lian Li,et al.  Boosting the performance of flow-sensitive points-to analysis using value flow , 2011, ESEC/FSE '11.

[18]  Lubos Brim,et al.  Computing Strongly Connected Components in Parallel on CUDA , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[19]  Bo Wu,et al.  Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.

[20]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[21]  Jingling Xue,et al.  Static memory leak detection using full-sparse value-flow analysis , 2012, ISSTA 2012.

[22]  Chris Hankin,et al.  Online Cycle Detection and Difference Propagation: Applications to Pointer Analysis , 2004, Software Quality Journal.

[23]  Manu Sridharan,et al.  The Complexity of Andersen's Analysis in Practice , 2009, SAS.

[24]  Ben Hardekopf,et al.  Flow-sensitive pointer analysis for millions of lines of code , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[25]  Jie Zhang,et al.  Making context‐sensitive inclusion‐based pointer analysis practical for compilers using parameterised summarisation , 2014, Softw. Pract. Exp..

[26]  Jianwen Zhu,et al.  Increasing the Scope and Resolution of Interprocedural Static Single Assignment , 2009, SAS.

[27]  Manu Sridharan,et al.  Thin slicing , 2007, PLDI '07.

[28]  Matei Ripeanu,et al.  On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[29]  Jingling Xue,et al.  Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis , 2014, IEEE Transactions on Software Engineering.

[30]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[31]  R. Govindarajan,et al.  Parallel flow-sensitive pointer analysis by graph-rewriting , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[32]  Jingling Xue,et al.  On-demand dynamic summary-based points-to analysis , 2012, CGO '12.

[33]  Chris Hankin,et al.  Efficient field-sensitive pointer analysis of C , 2007, TOPL.

[34]  Jingling Xue,et al.  Region-Based Selective Flow-Sensitive Pointer Analysis , 2014, SAS.

[35]  Fernando Magno Quintão Pereira,et al.  Wave Propagation and Deep Propagation for Pointer Analysis , 2009, 2009 International Symposium on Code Generation and Optimization.

[36]  R. Govindarajan,et al.  Prioritizing constraint evaluation for efficient points-to analysis , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[37]  Jingling Xue,et al.  Accelerating Dynamic Detection of Uses of Undefined Values with Static Value-Flow Analysis , 2014, CGO '14.

[38]  Jakob Rehof,et al.  Estimating the Impact of Scalable Pointer Analysis on Optimization , 2001, SAS.

[39]  David A. Bader,et al.  Task-based parallel breadth-first search in heterogeneous environments , 2012, 2012 19th International Conference on High Performance Computing.

[40]  Ondrej Lhoták,et al.  Pick your contexts well: understanding object-sensitivity , 2011, POPL '11.

[41]  Keshav Pingali,et al.  Morph algorithms on GPUs , 2013, PPoPP '13.

[42]  Qiang Sun,et al.  Probabilistic Points-to Analysis for Java , 2011, CC.

[43]  Rupesh Nasre,et al.  Parallel Replication-Based Points-To Analysis , 2012, CC.

[44]  Lian Li,et al.  Precise and scalable context-sensitive pointer analysis via value flow graph , 2013, ISMM '13.

[45]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[46]  Matthew Might,et al.  EigenCFA: accelerating flow analysis with GPUs , 2011, POPL '11.

[47]  Jingling Xue,et al.  Query-directed adaptive heap cloning for optimizing compilers , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[48]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[49]  Atanas Rountev,et al.  Merging equivalent contexts for scalable heap-cloning-based context-sensitive points-to analysis , 2008, ISSTA '08.

[50]  Keshav Pingali,et al.  Parallel inclusion-based points-to analysis , 2010, OOPSLA.

[51]  Matei Ripeanu,et al.  A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).