How to Build Static Checking Systems Using Orders of Magnitude Less Code

Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and hard to retarget to new languages, thereby dramatically limiting their scope. This paper reduces checking system complexity by addressing a fundamental assumption, the assumption that checkers must depend on a full-blown language specification and compiler front end. Instead, our program checkers are based on drastically incomplete language grammars ("micro-grammars") that describe only portions of a language relevant to a checker. As a result, our implementation is tiny-roughly 2500 lines of code, about two orders of magnitude smaller than a typical system. We hope that this dramatic increase in simplicity will allow people to use more checkers on more systems in more languages. We implement our approach in μchex, a language-agnostic framework for writing static bug checkers. We use it to build micro-grammar based checkers for six languages (C, the C preprocessor, C++, Java, JavaScript, and Dart) and find over 700 errors in real-world projects.

[1]  Dawson R. Engler,et al.  Checking system rules using system-specific, programmer-written compiler extensions , 2000, OSDI.

[2]  Alexander Aiken,et al.  A theory of type qualifiers , 1999, PLDI '99.

[3]  Robert O. Hastings,et al.  Fast detection of memory leaks and access errors , 1991 .

[4]  Alessandro Orso,et al.  Dytan: a generic dynamic taint analysis framework , 2007, ISSTA '07.

[5]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[6]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[7]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[8]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[9]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[10]  Gerard J. Holzmann,et al.  Software Model Checking , 1999, FORTE.

[11]  David A. Wagner,et al.  A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities , 2000, NDSS.

[12]  Karl N. Levitt,et al.  SELECT—a formal system for testing and debugging programs by symbolic execution , 1975 .

[13]  Ceriel J. H. Jacobs,et al.  Parsing Techniques: A Practical Guide, 2nd edition , 2008 .

[14]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[15]  Sriram K. Rajamani,et al.  Bebop: A Symbolic Model Checker for Boolean Programs , 2000, SPIN.

[16]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[17]  James Newsom,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Network and Distributed System Security Symposium Conference Proceedings : 2005 , 2005 .

[18]  F. J. Corbató,et al.  Multics: the first seven years , 1972, AFIPS '72 (Spring).

[19]  Sorin Lerner,et al.  ESP: path-sensitive program verification in polynomial time , 2002, PLDI '02.

[20]  Rajeev Motwani,et al.  , “Introduction to Automata Theory, Languages and Computations”, second Edition, Pearson Education, 2007 , 2015 .

[21]  Ceriel J. H. Jacobs,et al.  Parsing Techniques - A Practical Guide , 2007, Monographs in Computer Science.

[22]  Robert DeLine,et al.  Enforcing high-level protocols in low-level software , 2001, PLDI '01.

[23]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[24]  Gerard J. Holzmann,et al.  SOFTWARE TESTING, VERIFICATION AND RELIABILITY , 2022 .

[25]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[26]  Julia L. Lawall,et al.  Finding Error Handling Bugs in OpenSSL Using Coccinelle , 2010, 2010 European Dependable Computing Conference.

[27]  Ken Thompson,et al.  Plan 9 from Bell Labs , 1995 .

[28]  David A. Patterson,et al.  Reduced instruction set computers , 1985, CACM.

[29]  Daniel M. Roy,et al.  Enhancing Server Availability and Security Through Failure-Oblivious Computing , 2004, OSDI.

[30]  Sriram K. Rajamani,et al.  Automatically validating temporal safety properties of interfaces , 2001, SPIN '01.

[31]  Armando Solar-Lezama,et al.  Towards optimization-safe systems: analyzing the impact of undefined behavior , 2013, SOSP.

[32]  Alexander Aiken,et al.  Context- and path-sensitive memory leak detection , 2005, ESEC/FSE-13.

[33]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[34]  Sriram K. Rajamani,et al.  SLAM and Static Driver Verifier: Technology Transfer of Formal Methods inside Microsoft , 2004, IFM.

[35]  K. Rustan M. Leino,et al.  ESC/Java User's Manual , 2000 .

[36]  Ken Thompson,et al.  The UNIX time-sharing system , 1974, CACM.

[37]  Mark Lillibridge,et al.  Extended static checking for Java , 2002, PLDI '02.

[38]  George C. Necula,et al.  CCured: type-safe retrofitting of legacy code , 2002, POPL '02.

[39]  Kai von Fintel,et al.  NPI Licensing, Strawson Entailment, and Context Dependency , 1999, J. Semant..

[40]  Daan Leijen,et al.  Parsec: direct style monadic parser combinators for the real world , 2001 .

[41]  Junfeng Yang,et al.  Verifying systems rules using rule-directed symbolic execution , 2013, ASPLOS '13.

[42]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[43]  Butler W. Lampson,et al.  Hints for Computer System Design , 1983, IEEE Software.

[44]  James C. Corbett,et al.  Bandera: extracting finite-state models from Java source code , 2000, ICSE.

[45]  Sarfraz Khurshid,et al.  Directed incremental symbolic execution , 2011, PLDI '11.

[46]  Leon Moonen Lightweight impact analysis using island grammars , 2002, Proceedings 10th International Workshop on Program Comprehension.

[47]  Dawson R. Engler,et al.  Using Redundancies to Find Errors , 2003, IEEE Trans. Software Eng..

[48]  David A. Wagner,et al.  Intrusion detection via static analysis , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[49]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.