Fast graph simplification for interleaved Dyck-reachability

Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability. Interleaved Dyck language reachability (InterDyck-reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The InterDyck language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependences, such as field-sensitivity and pointer references/dereferences. In the ideal case, an InterDyck-reachability framework should model multiple Dyck languages simultaneously. Unfortunately, precise InterDyck-reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the InterDyck-reachability formulation. This paper offers a new perspective on improving both the precision and the scalability of InterDyck-reachability: we aim to simplify the underlying input graph G. Our key insight is based on the observation that if an edge is not contributing to any InterDyck-path, we can safely eliminate it from G. Our technique is orthogonal to the InterDyck-reachability formulation, and can serve as a pre-processing step with any over-approximating approaches for InterDyck-reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent InterDyck-reachability-based taint analysis for Android. Our evaluation on three popular InterDyck-reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three InterDyck-reachability algorithms, sometimes dramatically.

[1]  Alexander Aiken,et al.  The set constraint/CFL reachability connection in practice , 2004, PLDI '04.

[2]  Ben Hardekopf,et al.  Exploiting Pointer and Location Equivalence to Optimize Pointer Analysis , 2007, SAS.

[3]  Xin Zheng,et al.  Demand-driven alias analysis for C , 2008, POPL '08.

[4]  A Pnueli,et al.  Two Approaches to Interprocedural Data Flow Analysis , 2018 .

[5]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[6]  Vineet Kahlon Boundedness vs. Unboundedness of Lock Chains: Characterizing Decidability of Pairwise CFL-Reachability for Threads Communicating via Locks , 2009, 2009 24th Annual IEEE Symposium on Logic In Computer Science.

[7]  Zhendong Su,et al.  Context-sensitive data-dependence analysis via linear conjunctive language reachability , 2017, POPL.

[8]  Swarat Chaudhuri,et al.  Subcubic algorithms for recursive state machines , 2008, POPL '08.

[9]  SridharanManu,et al.  Demand-driven points-to analysis for Java , 2005 .

[10]  AikenAlex,et al.  The set constraint/CFL reachability connection in practice , 2004 .

[11]  Manu Sridharan,et al.  Demand-driven points-to analysis for Java , 2005, OOPSLA '05.

[12]  Mira Mezini,et al.  Access-Path Abstraction: Scaling Field-Sensitive Data-Flow Analysis with Unbounded Access Paths (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  SridharanManu,et al.  Refinement-based context-sensitive points-to analysis for Java , 2006 .

[14]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[15]  Atanas Rountev,et al.  Off-line variable substitution for scaling points-to analysis , 2000, PLDI '00.

[16]  Alexander Aiken,et al.  Partial online cycle elimination in inclusion constraint graphs , 1998, PLDI.

[17]  Manu Sridharan,et al.  Refinement-based context-sensitive points-to analysis for Java , 2006, PLDI '06.

[18]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[19]  Eric Bodden,et al.  Context-, flow-, and field-sensitive data-flow analysis using synchronized Pushdown systems , 2019, Proc. ACM Program. Lang..

[20]  Thomas W. Reps,et al.  Program analysis via graph reachability , 1997, Inf. Softw. Technol..

[21]  Thomas W. Reps,et al.  Undecidability of context-sensitive data-dependence analysis , 2000, TOPL.

[22]  SuZhendong,et al.  Context-sensitive data-dependence analysis via linear conjunctive language reachability , 2017 .

[23]  G. Ramalingam,et al.  Context-sensitive synchronization-sensitive analysis is undecidable , 2000, TOPL.

[24]  MeiHong,et al.  Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks , 2015 .

[25]  Deepak D'Souza,et al.  Scalable Flow-Sensitive Pointer Analysis for Java with Strong Updates , 2012, ECOOP.

[26]  Krishnendu Chatterjee,et al.  Optimal Dyck reachability for data-dependence and alias analysis , 2017, Proc. ACM Program. Lang..

[27]  Manu Sridharan,et al.  Scaling CFL-Reachability-Based Points-To Analysis Using Context-Sensitive Must-Not-Alias Analysis , 2009, ECOOP.

[28]  Hao Tang,et al.  Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks , 2015, POPL.

[29]  Zhendong Su,et al.  Efficient subcubic alias analysis for C , 2014, OOPSLA 2014.

[30]  Wen-mei W. Hwu,et al.  Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation , 2000, PLDI '00.

[31]  Julian Dolby,et al.  Scalable and precise taint analysis for Android , 2015, ISSTA.

[32]  Zhendong Su,et al.  Fast algorithms for Dyck-CFL-reachability with applications to alias analysis , 2013, PLDI.

[33]  Atanas Rountev,et al.  Demand-driven context-sensitive alias analysis for Java , 2011, ISSTA '11.