Coping with type casts in C

The use of type casts is pervasive in C. Although casts provide great flexibility in writing programs, their use obscures the meaning of programs, and can present obstacles during maintenance. Casts involving pointers to structures (C structs) are particularly problematic, because by using them, a programmer can interpret any memory region to be of any desired type, thereby compromising C's already weak type system. This paper presents an approach for making sense of such casts, in terms of understanding their purpose and identifying fragile code. We base our approach on the observation that casts are often used to simulate object-oriented language features not supported directly in C. We first describe a variety of ways — idioms — in which this is done in C programs. We then develop a notion of physical subtyping, which provides a model that explains these idioms. We have created tools that automatically analyze casts appearing in C programs. Experimental evidence collected by using these tools on a large amount of C code (over a million lines) shows that, of the casts involving struct types, most (over 90%) can be associated meaningfully — and automatically — with physical subtyping. Our results indicate that the idea of physical subtyping is useful in coping with casts and can lead to valuable software productivity tools.

[1]  Murray Hill,et al.  Lint, a C Program Checker , 1978 .

[2]  Thomas W. Reps,et al.  Program generalization for software reuse: from C to C++ , 1996, SIGSOFT '96.

[3]  Carl A. Gunter Semantics of programming languages: structures and techniques , 1993, Choice Reviews Online.

[4]  Martín Abadi,et al.  A Theory of Objects , 1996, Monographs in Computer Science.

[5]  Luca Cardelli,et al.  A Semantics of Multiple Inheritance , 1984, Information and Computation.

[6]  Robert D. Tennent,et al.  Semantics of programming languages , 1991, Prentice Hall International Series in Computer Science.

[7]  Luca Cardelli,et al.  On understanding types, data abstraction, and polymorphism , 1985, CSUR.

[8]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[9]  Geoffrey Smith,et al.  Towards an ML-Style Polymorphic Type System for C , 1996, ESOP.

[10]  Thomas Reps,et al.  Techniques for software renovation , 1998 .

[11]  Gregor Snelting,et al.  Polymorphic components for monomorphic languages , 1993, [1993] Proceedings Advances in Software Reuse.

[12]  David E. Evans,et al.  Static detection of dynamic memory errors , 1996, PLDI '96.

[13]  Gordon Plotkin,et al.  Semantics of Data Types , 1984, Lecture Notes in Computer Science.

[14]  Robert O'Callahan,et al.  Lackwit: A Program Understanding Tool Based on Type Inference , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[15]  Bjarne Steensgaard Points-to Analysis by Type Inference of Programs with Structures and Unions , 1996, CC.

[16]  S. C. Johnson,et al.  UNIX time-sharing system: Portability of c programs and the UNIX system , 1978, The Bell System Technical Journal.