Static analysis via abstract interpretation of multithreaded programs. (Analyse statique de logiciels MultitâCHES par InterpréTation abstraite)

The goal of this thesis is to present a generic static analysis of Java multithreaded programs. Multithreaded programs execute many task, called threads, in parallel. Threads communicate through the shared memory implicitly, and they synchronize on monitors, wait-notify primitives, etc... Some years ago dual core architectures started being distributed on the broad market at low price. Today almost all the computers are at least dual core. Manycore, i.e. putting more and more cores on the same CPU, is now the current trend of CPU market. This multicore revolution yields to new challenges on the programming side too, asking the developers to implement multithreaded programs. Multithreading is supported natively by the most common programming languages, e.g. Java and C#. The goal of static analysis is to compute behavioral information about the executions of a program, in a safe and automatic way. An application of static analysis is the development of tools that help to debug programs. In the field of static analysis, many different approaches have been proposed. We will follow the framework of abstract interpretation, a mathematical theory that allows to define and soundly approximate semantics of programs. This methodology has been already applied to a wide set of programming languages. The basic idea of generic analyzers is to develop a tool that can be plugged with different numerical domains and properties. During the last years many works addressed this issue, and they were successfully applied to debug industrial software. The strength of these analyzers is that the most part of the analysis can be re-used in order to check several properties. The use of different numerical domains allows to develop faster and less precise or slower and more precise analyses. In this thesis, the design of a generic analyzer for multithreaded programs is presented. First of all, we define the happens-before memory model in fixpoint form and we abstract it with a computable semantics. Memory models define which behaviors are allowed during the execution of a multithreaded program. Starting from the (informal) definition of the happens-before memory model, we define a semantics that builds up all the finite executions following this memory model. An execution of a multithreaded program is represented as a function that relates threads to traces of states. We show how to design a computable abstract semantics, and we prove the correctness of the resulting analysis, in a formal way. Then we define and abstract a new property focused on the non-deterministic behaviors due to multithreading, e.g. the arbitrary interleaving during the execution of different threads. First of all, the non-determinism of a multithreaded program is defined as difference between executions. If two executions expose different behaviors because of values read from and written to the shared memory, then that program is not deterministic. We abstract it in two steps: in the first step we collect, for each thread, the (abstract) value that it may write into a given location of the shared memory. At the second level we summarize all the values written in parallel, while tracking the set of threads that may have written it. At the first level of abstraction, we introduce the new concept of weak determinism. We propose other ways in order to relax the deterministic property, namely by projecting traces and states, and we define a global hierarchy. We formally study how the presence of data races may afflict the determinism of the program. We apply this theoretical framework to Java. In particular, we define a concrete semantics of bytecode language following its specification. Then we abstract it in order to track the information required by the analysis of multithreaded programs. The core is an alias analysis that approximates references in order to identify threads, to check the accesses to the shared memory, and to detect when two threads own a common monitor thereby inferring which parts of the code cannot be executed in parallel. The generic analyzer described above has been fully implemented, leading to Checkmate, the first generic analyzer of Java multithreaded programs. We report and deeply study some experimental results. In particular, we analyze the precision of the analysis when applied to some common pattern of concurrent programming and some case studies, and its performances when applied to an incremental application and to a set of well-known benchmarks. An additional contribution of the thesis is about the extension of an existing industrial generic analyzer, Clousot, to the checking of buffer overrun. It turns out that this analysis is scalable and precise. In summary, we present an application of an existing, industrial, and generic static analyzer to a property of practical interest, showing the strength of this approach in order to develop useful tools for developers.

[1]  Gérard Boudol,et al.  Relaxed memory models: an operational approach , 2009, POPL '09.

[2]  Edward A. Lee The problem with threads , 2006, Computer.

[3]  Fausto Spoto,et al.  Deriving escape analysis by abstract interpretation , 2006, High. Order Symb. Comput..

[4]  Dinghao Wu,et al.  KISS: keep it simple and sequential , 2004, PLDI '04.

[5]  Thomas R. Gross,et al.  Object race detection , 2001, OOPSLA '01.

[6]  K. Rustan M. Leino,et al.  Declaring and checking non-null types in an object-oriented language , 2003, OOPSLA 2003.

[7]  Manuel Fähndrich,et al.  Pentagons: a weakly relational abstract domain for the efficient validation of array accesses , 2008, SAC '08.

[8]  Philippa Gardner,et al.  Automatic Parallelization with Separation Logic , 2009, ESOP.

[9]  Markus Müller-Olm,et al.  Conflict Analysis of Programs with Procedures, Dynamic Thread Creation, and Monitors , 2008, SAS.

[10]  Ami Marowka Parallel computing on any desktop , 2007, CACM.

[11]  Paul Gastin,et al.  A Truly Concurrent Semantics for a Simple Parallel Programming Language , 1999, CSL.

[12]  Sriram Sankaranarayanan,et al.  Fast and Accurate Static Data-Race Detection for Concurrent Programs , 2007, CAV.

[13]  Maurice Herlihy,et al.  Proving correctness of highly-concurrent linearisable objects , 2006, PPoPP '06.

[14]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[15]  Fausto Spoto Nullness Analysis in Boolean Form , 2008, 2008 Sixth IEEE International Conference on Software Engineering and Formal Methods.

[16]  Piotr Nienaltowski,et al.  Efficient data race and deadlock prevention in concurrent object-oriented programs , 2004, OOPSLA '04.

[17]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[18]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[19]  Doug Lea,et al.  Concurrent Programming In Java , 1996 .

[20]  Vivek Sarkar,et al.  Compilation techniques for parallel systems , 1999, Parallel Comput..

[21]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[22]  Radha Jagadeesan,et al.  A theory of memory models , 2007, PPOPP.

[23]  Martín Abadi,et al.  Types for safe locking: Static race detection for Java , 2006, TOPL.

[24]  Klaus Havelund,et al.  Towards a framework and a benchmark for testing tools for multi‐threaded programs , 2007, Concurr. Comput. Pract. Exp..

[25]  Jacob M. Howe,et al.  Two Variables per Linear Inequality as an Abstract Domain , 2002, LOPSTR.

[26]  James R. Larus,et al.  Software and the Concurrency Revolution , 2005, ACM Queue.

[27]  Francesco Logozzo,et al.  SubPolyhedra: A (More) Scalable Approach to Infer Linear Inequalities , 2009, VMCAI.

[28]  David Hovemeyer,et al.  Finding more null pointer bugs, but not too many , 2007, PASTE '07.

[29]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[30]  Klaus Havelund,et al.  Confirmation of deadlock potentials detected by runtime analysis , 2006, PADTAD '06.

[31]  Ahmed Bouajjani,et al.  Context-Bounded Analysis of Multithreaded Programs with Dynamic Linked Structures , 2007, CAV.

[32]  Patrick Cousot,et al.  Abstract Interpretation Frameworks , 1992, J. Log. Comput..

[33]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[34]  Barton P. Miller,et al.  What are race conditions?: Some issues and formalizations , 1992, LOPL.

[35]  Manuel Fähndrich,et al.  On the Relative Completeness of Bytecode Analysis Versus Source Code Analysis , 2008, CC.

[36]  K. Rustan M. Leino,et al.  Verification of Object-Oriented Programs with Invariants , 2003, J. Object Technol..

[37]  Patrick Cousot,et al.  The ASTR ´ EE Analyzer , 2005 .

[38]  Manuvir Das,et al.  Unification-based pointer analysis with directional assignments , 2000, PLDI '00.

[39]  Gary J. Nutt,et al.  A unified theory of shared memory consistency , 2002, JACM.

[40]  Mark Lillibridge,et al.  Extended static checking for Java , 2002, PLDI '02.

[41]  Isabelle Pollet Towards a generic framework for the abstract interpretation of Java , 2004 .

[42]  Ganesh Gopalakrishnan,et al.  Rigorous Concurrency Analysis of Multithreaded Programs , 2003 .

[43]  Jorge A. Navas,et al.  A generic, context sensitive analysis framework for object oriented programs , 2007 .

[44]  Manuel V. Hermenegildo Parallelizing irregular and pointer-based computations automatically: Perspectives from logic and constraint programming , 2000, Parallel Comput..

[45]  G. Ramalingam,et al.  Context-sensitive synchronization-sensitive analysis is undecidable , 2000, TOPL.

[46]  Thuan Quang Huynh,et al.  A Memory Model Sensitive Checker for C# , 2006, FM.

[47]  David R. Cok,et al.  ESC/Java2: Uniting ESC/Java and JML , 2004, CASSIS.

[48]  G. Brat,et al.  Precise and Scalable Static Program Analysis of NASA Flight Software , 2005, 2005 IEEE Aerospace Conference.

[49]  David Aspinall Java Memory Model Examples: Good, Bad and Ugly , 2007 .

[50]  Theo C. Ruys,et al.  MMC: the Mono Model Checker , 2007, Bytecode@ETAPS.

[51]  R. Nigel Horspool,et al.  Static analysis of PostScript code , 1992, Proceedings of the 1992 International Conference on Computer Languages.

[52]  P. Cousot,et al.  Constructive versions of tarski's fixed point theorems , 1979 .

[53]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[54]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[55]  Pietro Ferrara,et al.  Checkmate: A Generic Static Analyzer of Java Multithreaded Programs , 2009, 2009 Seventh IEEE International Conference on Software Engineering and Formal Methods.

[56]  Francesco Logozzo Cibai: An Abstract Interpretation-Based Static Analyzer for Modular Analysis and Verification of Java Classes , 2007, VMCAI.

[57]  David Pichardie,et al.  Semantic Foundations and Inference of Non-null Annotations , 2008, FMOODS.

[58]  Tulika Mitra,et al.  Specifying multithreaded Java semantics for program verification , 2002, ICSE '02.

[59]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[60]  Robert Grimm,et al.  Jeannie: granting java native interface developers their wishes , 2007, OOPSLA.

[61]  Jeffrey S. Foster,et al.  Lock Inference for Atomic Sections , 2006 .

[62]  Patrick Cousot,et al.  The ASTREÉ Analyzer , 2005, ESOP.

[63]  Mark D. Hill,et al.  Weak ordering—a new definition , 1998, ISCA '98.

[64]  Michael Karr,et al.  Affine relationships among variables of a program , 1976, Acta Informatica.

[65]  Serge Chaumette,et al.  A formal model of the java multithreading system and its validation on a known problem , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[66]  Michael D. Ernst,et al.  Static Deadlock Detection for Java Libraries , 2005, ECOOP.

[67]  S. Genaim,et al.  Constancy Analysis , 2008 .

[68]  Bensalem Saddek,et al.  SCALABLE DYNAMIC DEADLOCK ANALYSIS OF MULTI-THREADED PROGRAMS , 2005 .

[69]  Alexander Aiken,et al.  Conditional must not aliasing for static race detection , 2007, POPL '07.

[70]  Madan Musuvathi,et al.  Iterative context bounding for systematic testing of multithreaded programs , 2007, PLDI '07.

[71]  Thomas A. Henzinger,et al.  Race checking by context inference , 2004, PLDI '04.

[72]  Bor-Yuh Evan Chang,et al.  Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[73]  John C. Reynolds Toward a Grainless Semantics for Shared-Variable Concurrency , 2004, FSTTCS.

[74]  Alexander Knapp,et al.  The Java Memory Model: Operationally, Denotationally, Axiomatically , 2007, ESOP.

[75]  David A. Wagner,et al.  A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities , 2000, NDSS.

[76]  Helmut Veith,et al.  An Abstract Interpretation-Based Framework for Control Flow Reconstruction from Binaries , 2008, VMCAI.

[77]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[78]  Patrick Cousot,et al.  Abstract Interpretation and Application to Logic Programs , 1992, J. Log. Program..

[79]  Stephen N. Freund,et al.  Velodrome: a sound and complete dynamic atomicity checker for multithreaded programs , 2008, PLDI '08.

[80]  Azadeh Farzan,et al.  Causal Dataflow Analysis for Concurrent Programs , 2007, TACAS.

[81]  Zhe Yang,et al.  Modular checking for buffer overflows in the large , 2006, ICSE.

[82]  Vladimir Gurvich,et al.  Generating All Vertices of a Polyhedron Is Hard , 2006, SODA '06.

[83]  Alan J. Hu,et al.  A Scalable Memory Model for Low-Level Code , 2008, VMCAI.

[84]  David Geer For Programmers, Multicore Chips Mean Multiple Challenges , 2007, Computer.

[85]  Ecma,et al.  Common Language Infrastructure (CLI) , 2001 .

[86]  Michael Rodeh,et al.  Cleanness Checking of String Manipulations in C Programs via Integer Analysis , 2001, SAS.

[87]  Fausto Spoto,et al.  Julia: A Generic Static Analyser for the Java Bytecode , 2005 .

[88]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[89]  Étienne Payet,et al.  Magic-Sets Transformation for the Analysis of Java Bytecode , 2007, SAS.

[90]  David A. Padua,et al.  Automatic detection of nondeterminacy in parallel programs , 1988, PADD '88.

[91]  Philippe Granger,et al.  Static Analysis of Linear Congruence Equalities among Variables of a Program , 1991, TAPSOFT, Vol.1.

[92]  Tayssir Touili,et al.  Interprocedural Analysis of Concurrent Programs Under a Context Bound , 2008, TACAS.

[93]  Pietro Ferrara,et al.  A fast and precise analysis for data race detection , 2008 .

[94]  Antoni W. Mazurkiewicz,et al.  Trace Theory , 1986, Advances in Petri Nets.

[95]  Rahul Agarwal,et al.  Detecting Potential Deadlocks with Static Analysis and Run-Time Monitoring , 2005, Haifa Verification Conference.

[96]  Manuel Fähndrich,et al.  Pentagons: A weakly relational domain for the efficient validation of array accesses , 2008 .

[97]  Flemming Nielson,et al.  Modal Abstractions of Concurrent Behaviour , 2008, SAS.

[98]  Pietro Ferrara Static Analysis of the Determinism of Multithreaded Programs , 2008, 2008 Sixth IEEE International Conference on Software Engineering and Formal Methods.

[99]  Phil Pfeiffer,et al.  Dependence analysis for pointer variables , 1989, PLDI '89.

[100]  Martin C. Rinard,et al.  Analysis of Multithreaded Programs , 2001, SAS.

[101]  Antoine Miné,et al.  The octagon abstract domain , 2001, High. Order Symb. Comput..

[102]  Baudouin Le Charlier,et al.  Towards a Complete Static Analyser for Java: an Abstract Interpretation Framework and its Implementation , 2005, Electron. Notes Theor. Comput. Sci..

[103]  Patrick Cousot,et al.  The calculational design of a generic abstract interpreter , 1999 .

[104]  Jeffrey S. Foster,et al.  Polymorphic Type Inference for the JNI , 2006, ESOP.

[105]  Jakob Rehof,et al.  Context-Bounded Model Checking of Concurrent Software , 2005, TACAS.

[106]  A. Tarski A LATTICE-THEORETICAL FIXPOINT THEOREM AND ITS APPLICATIONS , 1955 .

[107]  Bertrand Meyer,et al.  Object-Oriented Software Construction, 2nd Edition , 1997 .

[108]  Simon L. Peyton Jones,et al.  Composable memory transactions , 2005, CACM.

[109]  J. Gregory Morrisett,et al.  Ilea: inter-language analysis across java and c , 2007, OOPSLA.

[110]  Fausto Spoto,et al.  Information Flow Analysis for Java Bytecode , 2005, VMCAI.

[111]  K. Rustan M. Leino,et al.  A Basis for Verifying Multi-threaded Programs , 2009, ESOP.

[112]  Jan Wen Voung,et al.  Dataflow analysis for concurrent programs using datarace detection , 2008, PLDI '08.

[113]  P. Madden,et al.  On the Marketing of Multicore , 2022 .

[114]  Alexandru Nicolau,et al.  A general data dependence test for dynamic, pointer-based data structures , 1994, PLDI '94.

[115]  Pietro Ferrara JAIL: Firewall Analysis of Java Card by Abstract Interpretation , 2006 .

[116]  Robert Bruce Findler,et al.  Operational semantics for multi-language programs , 2009 .

[117]  Andy King,et al.  Analyzing String Buffers in C , 2002, AMAST.

[118]  Michael Rodeh,et al.  CSSV: towards a realistic tool for statically detecting all buffer overflows in C , 2003, PLDI '03.

[119]  Bertrand Meyer,et al.  Contracts for concurrency , 2009, Formal Aspects of Computing.

[120]  P. Cousot Thesis: These d'Etat es sciences mathematiques: Methodes iteratives de construction et d'approximation de points fixes d'operateurs monotones sur un treillis, analyse semantique de programmes (in French) , 1978 .

[121]  William Pugh The Java memory model is fatally flawed , 2000 .

[122]  Sheng Liang,et al.  Java Native Interface: Programmer's Guide and Reference , 1999 .

[123]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[124]  Martin C. Rinard,et al.  Symbolic bounds analysis of pointers, array indices, and accessed memory regions , 2005, TOPL.

[125]  Jorge A. Navas,et al.  An Efficient, Parametric Fixpoint Algorithm for Analysis of Java Bytecode , 2007, Bytecode@ETAPS.

[126]  Michel Dubois,et al.  Memory access buffering in multiprocessors , 1998, ISCA '98.

[127]  Pietro Ferrara Static Analysis Via Abstract Interpretation of the Happens-Before Memory Model , 2008, TAP.

[128]  K. Rustan M. Leino,et al.  The Spec# Programming System: An Overview , 2004, CASSIS.

[129]  Pietro Ferrara,et al.  Safer unsafe code for .NET , 2008, OOPSLA '08.