Toward a Verified Software Toolchain for Java

Software are increasingly complex and are unavoidably subject to programming errors (a.k.a. bugs). The problem is well known and many techniques have been developed in order to reduce the number of bugs in a program. Among them, this document specially studies automatic verification techniques that operate at compile time and that aim at catching all errors of a certain kind: static analyses and type systems. For example, we can rely on an information flow type system to verify, before running or distributing a program, that it does not leak confidential information to the external environment. One concern we can have is about the reliability of such a verification. Indeed, verification tools are themselves complex softwares. Moreover they make assumptions about the execution model of programs but this model is itself an abstraction of the real compiled code that is run at the end. Therefore the whole software toolchain of a programming language requires reliability. Our ultimate objective is to build such a toolchain for the Java programming language. Our challenge here is to build a verified platform: each components (transformation, verification tools) should be formally specified in an expressive logic and correctness of the implementation of these components should be rigorously proved. Proof assistants are of special interest for these tasks: they provide a very rich specification language (based for example on high- order logic) with an automatic mechanism to check validity of proofs. The proof we are interested in are specially long and too error-prone to be fully verify at hand. Proof assistants allow for writing programs (here compilers and verification tools), their specification, and the corresponding correctness proof in a unified, logical framework. Several of them provide an extraction mechanism that automatically generates executable code that fulfills the formalized specification. We mainly focus on Java because it is a modern language with several challenging features: security mechanisms, type and memory safety, modularity. Still many facets of our work are not fully specific to this programming language. The object-oriented programming paradigm is for example quite orthogonal in this work. This document will summarise seven years of my research work around this objective. As we will see in conclusion the road is still long to achieve our goal but we already have learnt some interesting lessons that we will share here.

[1]  David Pichardie,et al.  A Certified Data Race Analysis for a Java-like Language , 2009, TPHOLs.

[2]  Alexander Aiken,et al.  Conditional must not aliasing for static race detection , 2007, POPL '07.

[3]  David Pichardie,et al.  Enforcing Secure Object Initialization in Java , 2010, ESORICS.

[4]  Bertrand Jeannet,et al.  Apron: A Library of Numerical Abstract Domains for Static Analysis , 2009, CAV.

[5]  Xin Qi,et al.  Masked types for sound object initialization , 2009, POPL '09.

[6]  J. Gregory Morrisett,et al.  Evaluating value-graph translation validation for LLVM , 2011, PLDI '11.

[7]  Geoffrey Smith,et al.  A Type-Based Approach to Program Security , 1997, TAPSOFT.

[8]  Sebastian Hack,et al.  Register allocation for programs in SSA form , 2006, CC.

[9]  Frédéric Besson,et al.  Fast Reflexive Arithmetic Tactics the Linear Case and Beyond , 2006, TYPES.

[10]  Michael J. C. Gordon,et al.  Mechanizing programming logics in higher order logic , 1989 .

[11]  Patrick Cousot,et al.  Combination of Abstractions in the ASTRÉE Static Analyzer , 2006, ASIAN.

[12]  Mads Tofte,et al.  Region-based Memory Management , 1997, Inf. Comput..

[13]  Gilles Barthe,et al.  A Formally Verified SSA-Based Middle-End - Static Single Assignment Meets CompCert , 2012, ESOP.

[14]  George C. Necula,et al.  Translation validation for an optimizing compiler , 2000, PLDI '00.

[15]  David Pichardie,et al.  Sawja: Static Analysis Workshop for Java , 2010, FoVeOOS.

[16]  Andrew W. Appel,et al.  SSA is functional programming , 1998, SIGP.

[17]  Xavier Leroy,et al.  Tilting at Windmills with Coq: Formal Verification of a Compilation Algorithm for Parallel Moves , 2007, Journal of Automated Reasoning.

[18]  Tobias Nipkow,et al.  Verified Bytecode Verifiers , 2001, FoSSaCS.

[19]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[20]  Benjamin C. Pierce,et al.  Mechanized Metatheory for the Masses: The PoplMark Challenge , 2005, TPHOLs.

[21]  David Pichardie,et al.  Secure the Clones - Static Enforcement of Policies for Secure Object Copying , 2011, ESOP.

[22]  Timothy J. Harvey,et al.  Practical improvements to the construction and destruction of static single assignment form , 1998 .

[23]  David Pichardie Interprétation abstraite en logique intuitionniste : extraction d'analyseurs Java certifiés , 2005 .

[24]  Yutaka Matsuno,et al.  A type system equivalent to static single assignment , 2006, PPDP '06.

[25]  Quang Huy Nguyen,et al.  Industrial Use of Formal Methods for a High-Level Security Evaluation , 2008, FM.

[26]  K. Rustan M. Leino,et al.  Declaring and checking non-null types in an object-oriented language , 2003, OOPSLA.

[27]  Yves Bertot Structural Abstract Interpretation: A Formal Study Using Coq , 2008, LerNet ALFA Summer School.

[28]  Antoine Miné,et al.  The octagon abstract domain , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[29]  Anindya Banerjee,et al.  Stack-based access control and secure information flow , 2005, J. Funct. Program..

[30]  Jong-Deok Choi,et al.  Escape analysis for Java , 1999, OOPSLA '99.

[31]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[32]  David Pichardie,et al.  Certified Result Checking for Polyhedral Analysis of Bytecode Programs , 2010, TGC.

[33]  Amir Pnueli,et al.  Translation Validation , 1998, TACAS.

[34]  Xavier Leroy,et al.  Formal certification of a compiler back-end or: programming a compiler with a proof assistant , 2006, POPL '06.

[35]  Bernhard Steffen,et al.  Basic-Block Graphs: Living Dinosaurs? , 1998, CC.

[36]  Marieke Huisman,et al.  BicolanoMT: a Formalization of Multi-Threaded Java at Bytecode Level 1 , 2008 .

[37]  David Cachera,et al.  Extracting a Data Flow Analyser in Constructive Logic , 2004, ESOP.

[38]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[39]  Patrick Cousot,et al.  Comparing the Galois Connection and Widening/Narrowing Approaches to Abstract Interpretation , 1992, PLILP.

[40]  Xavier Leroy,et al.  Verified validation of lazy code motion , 2009, PLDI '09.

[41]  David Pichardie,et al.  A Provably Correct Stackless Intermediate Representation for Java Bytecode , 2010, APLAS.

[42]  Daniel Wasserrab From formal semantics to verified slicing: a modular framework with applications in language based security , 2011 .

[43]  Peter W. O'Hearn,et al.  Separation and information hiding , 2004, POPL.

[44]  Alexander Aiken,et al.  Checking and inferring local non-aliasing , 2003, PLDI '03.

[45]  Thème Sym Theorem proving support in programming language semantics , 2007 .

[46]  Bowen Alpern,et al.  Detecting equality of variables in programs , 1988, POPL '88.

[47]  Sabine Glesner,et al.  Optimizing Code Generation from SSA Form: A Comparison Between Two Formal Correctness Proofs in Isabelle/HOL , 2005, COCV@ETAPS.

[48]  Robert E. Tarjan,et al.  A fast algorithm for finding dominators in a flowgraph , 1979, TOPL.

[49]  Xavier Leroy,et al.  Validating LR(1) Parsers , 2012, ESOP.

[50]  Markus Müller-Olm,et al.  Formalization of Conflict Analysis of Programs with Procedures, Thread Creation, and Monitors , 2007, Arch. Formal Proofs.

[51]  Tobias Nipkow,et al.  Bytecode Analysis for Proof Carrying Code , 2005, Bytecode@ETAPS.

[52]  David Cachera,et al.  A Certified Denotational Abstract Interpreter , 2010, ITP.

[53]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[54]  David F. Bacon,et al.  Fast static analysis of C++ virtual function calls , 1996, OOPSLA '96.

[55]  Gerhard Goos,et al.  Verification of Compilers , 1999, Correct System Design.

[56]  Andrew C. Myers,et al.  Language-based information-flow security , 2003, IEEE J. Sel. Areas Commun..

[57]  T. Nipkow,et al.  Proving Concurrent Noninterference , 2012, CPP.

[58]  Antoine Miné Field-sensitive value analysis of embedded C programs with union types and pointer arithmetics , 2006, LCTES '06.

[59]  Tobias Nipkow,et al.  A machine-checked model for a Java-like language, virtual machine, and compiler , 2006, TOPL.

[60]  Andreas Lochbihler,et al.  Verifying a Compiler for Java Threads , 2010, ESOP.

[61]  David Pichardie Building Certified Static Analysers by Modular Construction of Well-founded Lattices , 2008, Electron. Notes Theor. Comput. Sci..

[62]  Songtao Xia,et al.  Establishing object invariants with delayed types , 2007, OOPSLA.

[63]  Marieke Huisman,et al.  The Java Memory Model: a Formal Explanation , 2007 .

[64]  Magnus O. Myreen Verified just-in-time compiler on x86 , 2010, POPL '10.

[65]  Andrew W. Appel,et al.  Foundational proof-carrying code , 2001, Proceedings 16th Annual IEEE Symposium on Logic in Computer Science.

[66]  Milo M. K. Martin,et al.  Formalizing the LLVM intermediate representation for verified program transformations , 2012, POPL '12.

[67]  Stephen N. Freund,et al.  A Type System for the Java Bytecode Language and Verifier , 2003, Journal of Automated Reasoning.

[68]  Andreas Lochbihler,et al.  Java and the Java Memory Model - A Unified, Machine-Checked Formalisation , 2012, ESOP.

[69]  Tobias Nipkow,et al.  Type Inference Verified: Algorithm W in Isabelle/HOL , 2004, Journal of Automated Reasoning.

[70]  Gregor Snelting,et al.  Flow-sensitive, context-sensitive, and object-sensitive information flow control based on program dependence graphs , 2009, International Journal of Information Security.

[71]  David A. Naumann Verifying a Secure Information Flow Analyzer , 2005, TPHOLs.

[72]  Andrew W. Appel,et al.  Oracle Semantics for Concurrent Separation Logic , 2008, ESOP.

[73]  Elsa L. Gunter,et al.  A Framework for Formal Verification of Compiler Optimizations , 2010, ITP.

[74]  Andrew McCreight,et al.  A verifiable SSA program representation for aggressive compiler optimization , 2006, POPL '06.

[75]  Xavier Leroy,et al.  A simple, verified validator for software pipelining , 2010, POPL '10.

[76]  Solange Coupet-Grimal,et al.  A Uniform and Certified Approach for Two Static Analyses , 2004, TYPES.

[77]  Tobias Nipkow Winskel is (Almost) Right: Towards a Mechanized Semantics Textbook , 1996, FSTTCS.

[78]  Andrew W. Appel,et al.  Separation Logic for Small-Step cminor , 2007, TPHOLs.

[79]  Bernhard Steffen,et al.  Lazy code motion , 1992, PLDI '92.

[80]  David Pichardie,et al.  Modular SMT Proofs for Fast Reflexive Checking Inside Coq , 2011, CPP.

[81]  Geoffrey Smith,et al.  Lenient array operations for practical secure information flow , 2004, Proceedings. 17th IEEE Computer Security Foundations Workshop, 2004..

[82]  Catherine Dubois,et al.  Certification of a Type Inference Tool for ML: Damas–Milner within Coq , 1999, Journal of Automated Reasoning.

[83]  Monica S. Lam,et al.  Automatic inference of stationary fields: a generalization of java's final fields , 2008, POPL '08.

[84]  Martín Abadi,et al.  A type system for Java bytecode subroutines , 1999, TOPL.

[85]  David Pichardie,et al.  Soundly Handling Static Fields: Issues, Semantics and Analysis , 2009, Electron. Notes Theor. Comput. Sci..

[86]  Gilles Barthe,et al.  A certified lightweight non-interference Java bytecode verifier† , 2007, Mathematical Structures in Computer Science.

[87]  Claude Marché,et al.  A Certified Multi-prover Verification Condition Generator , 2012, VSTTE.

[88]  Vipin Swarup,et al.  The VLISP verified Scheme system , 1995, LISP Symb. Comput..

[89]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[90]  Andrew C. Myers,et al.  JFlow: practical mostly-static information flow control , 1999, POPL '99.

[91]  Xavier Leroy,et al.  Mechanized Verification of CPS Transformations , 2007, LPAR.

[92]  David Gay,et al.  Lightweight annotations for controlling sharing in concurrent data structures , 2009, PLDI '09.

[93]  Pierre Courtieu,et al.  Tool-Assisted Specification and Verification of the JavaCard Platform , 2002, AMAST.

[94]  Claude Marché,et al.  The Why/Krakatoa/Caduceus Platform for Deductive Program Verification , 2007, CAV.

[95]  Michael Stepp,et al.  Equality saturation: a new approach to optimization , 2009, POPL '09.

[96]  James Gosling,et al.  The Java Language Specification, 3rd Edition , 2005 .

[97]  Xavier Leroy,et al.  A Formally-Verified Alias Analysis , 2012, CPP.

[98]  Patrick Cousot,et al.  The calculational design of a generic abstract interpreter , 1999 .

[99]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[100]  Tobias Nipkow,et al.  Asserting Bytecode Safety , 2005, ESOP.

[101]  Sorin Lerner,et al.  Equality-Based Translation Validator for LLVM , 2011, CAV.

[102]  Benjamin Grégoire,et al.  A Structured Approach to Proving Compiler Optimizations Based on Dataflow Analysis , 2004, TYPES.

[103]  David Pichardie,et al.  Semantic Foundations and Inference of Non-null Annotations , 2008, FMOODS.

[104]  Nick Benton,et al.  Some Domain Theory and Denotational Semantics in Coq , 2009, TPHOLs.

[105]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[106]  Robin Milner,et al.  Proving compiler correctness in a mechanised logic , 1972 .

[107]  Bruno Blanchet,et al.  Escape analysis for object-oriented languages: application to Java , 1999, OOPSLA '99.

[108]  Dan S. Wallach,et al.  Java security: from HotJava to Netscape and beyond , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.