论文信息 - Fast graph simplification for interleaved Dyck-reachability

Fast graph simplification for interleaved Dyck-reachability

Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability. Interleaved Dyck language reachability (InterDyck-reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The InterDyck language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependences, such as field-sensitivity and pointer references/dereferences. In the ideal case, an InterDyck-reachability framework should model multiple Dyck languages simultaneously. Unfortunately, precise InterDyck-reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the InterDyck-reachability formulation. This paper offers a new perspective on improving both the precision and the scalability of InterDyck-reachability: we aim to simplify the underlying input graph G. Our key insight is based on the observation that if an edge is not contributing to any InterDyck-path, we can safely eliminate it from G. Our technique is orthogonal to the InterDyck-reachability formulation, and can serve as a pre-processing step with any over-approximating approaches for InterDyck-reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent InterDyck-reachability-based taint analysis for Android. Our evaluation on three popular InterDyck-reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three InterDyck-reachability algorithms, sometimes dramatically.

Thomas Reps | Yuanbo Li | Qirun Zhang

[1] Alexander Aiken,et al. The set constraint/CFL reachability connection in practice , 2004, PLDI '04.

[2] Ben Hardekopf,et al. Exploiting Pointer and Location Equivalence to Optimize Pointer Analysis , 2007, SAS.

[3] Xin Zheng,et al. Demand-driven alias analysis for C , 2008, POPL '08.

[4] A Pnueli,et al. Two Approaches to Interprocedural Data Flow Analysis , 2018 .

[5] Michael A. Arbib,et al. An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[6] Vineet Kahlon. Boundedness vs. Unboundedness of Lock Chains: Characterizing Decidability of Pairwise CFL-Reachability for Threads Communicating via Locks , 2009, 2009 24th Annual IEEE Symposium on Logic In Computer Science.

[7] Zhendong Su,et al. Context-sensitive data-dependence analysis via linear conjunctive language reachability , 2017, POPL.

[8] Swarat Chaudhuri,et al. Subcubic algorithms for recursive state machines , 2008, POPL '08.

[9] SridharanManu,et al. Demand-driven points-to analysis for Java , 2005 .

[10] AikenAlex,et al. The set constraint/CFL reachability connection in practice , 2004 .

[11] Manu Sridharan,et al. Demand-driven points-to analysis for Java , 2005, OOPSLA '05.