Toward full elasticity in distributed static analysis: the case of callgraph analysis

In this paper we present the design and implementation of a distributed, whole-program static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. Our reliance on a cloud cluster provides a degree of elasticity for CPU, memory, and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented in a distributed setting. The vision that motivates this work is that every large-scale software repository such as GitHub, BitBucket, or Visual Studio Online will be able to perform static analysis on a large scale. We experimentally validate our implementation of the distributed call graph analysis using a combination of both synthetic and real benchmarks. To show scalability, we demonstrate how the analysis presented in this paper is able to handle inputs that are almost 10 million lines of code (LOC) in size, without running out of memory. Our results show that the analysis scales well in terms of memory pressure independently of the input size, as we add more virtual machines (VMs). As the number of worker VMs increases, we observe that the analysis time generally improves as well. Lastly, we demonstrate that querying the results can be performed with a median latency of 15 ms.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Liz Sonenberg,et al.  Fixed Point Theorems and Semantics: A Folk Tale , 1982, Inf. Process. Lett..

[3]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[4]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[5]  David F. Bacon,et al.  Fast static analysis of C++ virtual function calls , 1996, OOPSLA '96.

[6]  David Grove,et al.  Call graph construction in object-oriented languages , 1997, OOPSLA '97.

[7]  Laurie J. Hendren,et al.  Practical virtual method call resolution for Java , 2000, OOPSLA '00.

[8]  Jens Palsberg,et al.  Scalable propagation-based call graph construction algorithms , 2000, OOPSLA '00.

[9]  David Grove,et al.  A framework for call graph construction algorithms , 2001, TOPL.

[10]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[11]  Benjamin Livshits,et al.  Context-sensitive program analysis as database queries , 2005, PODS.

[12]  Manu Sridharan,et al.  Demand-driven points-to analysis for Java , 2005, OOPSLA '05.

[13]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[14]  Ondrej Lhoták,et al.  Context-Sensitive Points-to Analysis: Is It Worth It? , 2006, CC.

[15]  Ben Hardekopf,et al.  The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code , 2007, PLDI '07.

[16]  Sorin Lerner,et al.  RELAY: static race detection on millions of lines of code , 2007, ESEC-FSE '07.

[17]  Alexander Aiken,et al.  Saturn: A scalable framework for error detection using Boolean satisfiability , 2007, TOPL.

[18]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[19]  Keshav Pingali,et al.  Parallel inclusion-based points-to analysis , 2010, OOPSLA.

[20]  Hongtao Yu,et al.  Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code , 2010, CGO '10.

[21]  Jeffrey P. Bigham,et al.  Beyond autocomplete: Automatic function definition , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[22]  Ondrej Lhoták,et al.  Actor-Based Parallel Dataflow Analysis , 2011, CC.

[23]  Andrey Rybalchenko,et al.  Distributed and Predictable Software Model Checking , 2011, VMCAI.

[24]  Ben Hardekopf,et al.  Flow-sensitive pointer analysis for millions of lines of code , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[25]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[26]  Hung Viet Nguyen,et al.  Graph-based pattern-oriented, context-sensitive source code completion , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[27]  Thomas D. LaToza,et al.  Active code completion , 2011, 2012 34th International Conference on Software Engineering (ICSE).

[28]  Aws Albarghouthi,et al.  Parallelizing top-down interprocedural analyses , 2012, PLDI '12.

[29]  Hridesh Rajan,et al.  Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes , 2014, GPCE '13.

[30]  Hridesh Rajan,et al.  Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[31]  Murali Krishna Ramanathan,et al.  Scalable and incremental software bug detection , 2013, ESEC/FSE 2013.

[32]  Benjamin Livshits,et al.  Practical static analysis of JavaScript applications in the presence of frameworks and libraries , 2013, ESEC/FSE 2013.

[33]  Sarfraz Khurshid,et al.  Temporal code completion and navigation , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[34]  Hridesh Rajan,et al.  Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes , 2014 .

[35]  Sergey Bykov,et al.  Orleans: Distributed Virtual Actors for Programmability and Scalability , 2014 .

[36]  Hridesh Rajan,et al.  Mining billions of AST nodes to study actual and potential usage of Java language features , 2014, ICSE.

[37]  James Bornholt Scaling Program Synthesis by Exploiting Existing Code , 2015 .

[38]  Peter W. O'Hearn,et al.  Moving Fast with Software Verification , 2015, NFM.

[39]  Ciera Jaspan,et al.  Tricorder: Building a Program Analysis Ecosystem , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[40]  Ondrej Lhoták,et al.  In defense of soundiness , 2015, Commun. ACM.

[41]  Kai Wang,et al.  Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code , 2017, ASPLOS.