On fast large-scale program analysis in Datalog

Designing and crafting a static program analysis is challenging due to the complexity of the task at hand. Among the challenges are modelling the semantics of the input language, finding suitable abstractions for the analysis, and handwriting efficient code for the analysis in a traditional imperative language such as C++. Hence, the development of static program analysis tools is costly in terms of development time and resources for real world languages. To overcome, or at least alleviate the costs of developing a static program analysis, Datalog has been proposed as a domain specific language (DSL). With Datalog, a designer expresses a static program analysis in the form of a logical specification. While a domain specific language approach aids in the ease of development of program analyses, it is commonly accepted that such an approach has worse runtime performance than handcrafted static analysis tools. In this work, we introduce a new program synthesis methodology for Datalog specifications to produce highly efficient monolithic C++ analyzers. The synthesis technique requires the re-interpretation of the semi-naive evaluation as a scaffolding for translation using partial evaluation. To achieve high-performance, we employ staged-compilation techniques and specialize the underlying relational data structures for a given Datalog specification. Experimentation on benchmarks for large-scale program analysis validates the superior performance of our approach over available Datalog tools and demonstrates our competitiveness with state-of-the-art handcrafted tools.

[1]  Thomas W. Reps,et al.  Solving Demand Versions of Interprocedural Analysis Problems , 1994, CC.

[2]  Sumit Gulwani,et al.  Program Synthesis , 2017, Software Systems Safety.

[3]  Padmanabhan Krishnan,et al.  Staged Points-to Analysis for Large Code Bases , 2015, CC.

[4]  Letizia Tanca,et al.  What you Always Wanted to Know About Datalog (And Never Dared to Ask) , 1989, IEEE Trans. Knowl. Data Eng..

[5]  Carlos Alberto Martinez-Angeles,et al.  A Datalog Engine for GPUs , 2013, KDPD.

[6]  Fernando Sáenz-Pérez Outer Joins in a Deductive Database System , 2012, Electron. Notes Theor. Comput. Sci..

[7]  Oege de Moor,et al.  Adding magic to an optimising datalog compiler , 2008, SIGMOD Conference.

[8]  Stefan Brass,et al.  A Variant of Earley Deduction with Partial Evaluation , 2013, RR.

[9]  C. R. Ramakrishnan,et al.  Practical program analysis using general purpose logic programming systems—a case study , 1996, PLDI '96.

[10]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[11]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[12]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[13]  Wang Yi,et al.  Horn Clauses for Communicating Timed Systems , 2014, HCVS.

[14]  Yanhong A. Liu,et al.  From datalog rules to efficient programs with time and space guarantees , 2003, PPDP '03.

[15]  Padmanabhan Krishnan,et al.  Combining type-analysis with points-to analysis for analyzing Java library source-code , 2015, SOAP@PLDI.

[16]  Andrey Rybalchenko,et al.  Synthesizing software verifiers from proof rules , 2012, PLDI.

[17]  Yoshihiko Futamura,et al.  Partial Evaluation of Computation Process--An Approach to a Compiler-Compiler , 1999, High. Order Symb. Comput..

[18]  Till Westmann,et al.  A Datalog Source-to-Source Translator for Static Program Analysis: An Experience Report , 2015, 2015 24th Australasian Software Engineering Conference.

[19]  Mario Alviano,et al.  The Disjunctive Datalog System DLV , 2010, Datalog.

[20]  Jeffrey D. Ullman,et al.  Bottom-up beats top-down for datalog , 1989, PODS '89.

[21]  C. R. Ramakrishnan,et al.  Efficient Model Checking Using Tabled Resolution , 1997, CAV.

[22]  Yannis Smaragdakis,et al.  Strictly declarative specification of sophisticated points-to analyses , 2009, OOPSLA.

[23]  Keith H. Randall,et al.  Denali: a goal-directed superoptimizer , 2002, PLDI '02.

[24]  Emanuel Kitzelmann,et al.  Inductive Programming: A Survey of Program Synthesis Techniques , 2009, AAIP.

[25]  Jens Dietrich,et al.  Giga-scale exhaustive points-to analysis for Java in under a minute , 2015, OOPSLA.

[26]  Shan Shan Huang,et al.  Datalog and Recursive Query Processing , 2013, Found. Trends Databases.

[27]  Germán Vidal,et al.  Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation , 2009 .

[28]  Nikolaj Bjørner,et al.  μZ- An Efficient Engine for Fixed Points with Constraints , 2011, CAV.

[29]  Rastislav Bodík,et al.  Chlorophyll : Synthesis-Aided Compiler for Low-Power Spatial Architectures by Phitchaya Mangpo Phothilimthana , 2015 .

[30]  C. Cordell Green,et al.  What Is Program Synthesis? , 1985, J. Autom. Reason..

[31]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[32]  Yannis Smaragdakis,et al.  Hybrid context-sensitivity for points-to analysis , 2013, PLDI.

[33]  María Alpuente,et al.  Datalog-Based Program Analysis with BES and RWL , 2010, Datalog.

[34]  Bob Hughes,et al.  Annual Meetings and Money , 2014 .

[35]  Barry Bishop,et al.  IRIS-Integrated Rule Inference System , 2008 .

[36]  Jeffrey D. Ullman,et al.  A survey of deductive database systems , 1995, J. Log. Program..

[37]  Monica S. Lam,et al.  Using Datalog with Binary Decision Diagrams for Program Analysis , 2005, APLAS.

[38]  Robert Glück Is there a fourth Futamura projection? , 2009, PEPM '09.

[39]  Yannis Smaragdakis,et al.  Introspective analysis: context-sensitivity, across the board , 2014, PLDI.