C static analysis tools often use intermediate representations (IRs) that organize program data in a simple, well‐structured manner. However, the C parsers that create IRs are slow, and because they are difficult to write, only a few implementations exist, limiting the languages in which a C static analysis can be written. To solve these problems, we investigate two language‐independent, on‐disk representations of C IRs: one using XML and the other using an Internet standard binary encoding called eXternal Data Representation (XDR). We benchmark the parsing speeds of both options, finding the XML to be about a factor of 2 slower than parsing C and the XDR over 6 times faster. Furthermore, we show that the XML files are far too large at 19 times the size of C source code, whereas XDR is only 2.2 times the C size. We also demonstrate the portability of our XDR system by presenting a C source code querying tool in Ruby. Our solution and the insights we gained from building it will be useful to analysis authors and other clients of C IRs. We have made our software freely available for download at http://www.cs.umd.edu/projects/PL/scil/. Copyright © 2010 John Wiley&Sons, Ltd.
[1]
George C. Necula,et al.
Dependent Types for Low-Level Programming
,
2007,
ESOP.
[2]
David A. Patterson,et al.
Computer Architecture: A Quantitative Approach
,
1969
.
[3]
George C. Necula,et al.
CCured: type-safe retrofitting of legacy software
,
2005,
TOPL.
[4]
George C. Necula,et al.
CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs
,
2002,
CC.
[5]
Benjamin Livshits,et al.
Finding application errors and security flaws using PQL: a program query language
,
2005,
OOPSLA '05.
[6]
David A. Patterson,et al.
Computer Architecture - A Quantitative Approach (4. ed.)
,
2007
.
[7]
Isil Dillig,et al.
An overview of the saturn project
,
2007,
PASTE '07.
[8]
Jeffrey S. Foster,et al.
LOCKSMITH: context-sensitive correlation analysis for race detection
,
2006,
PLDI '06.