Quantifying Behavioral Differences Between C and C++ Programs

Improving the performance of C programs has been a topic of great interest for many years. Both hardware technology and compiler optimization research has been applied in an effort to make C programs execute faster. In many application domains, the C++ language is replacing C as the programming language of choice. In this paper, we measure the empirical behavior of a group of significant C and C++ programs and attempt to identify and quantify behavioral differences between them. Our goal is to investigate whether optimization technology that has been successful for C programs will also be successful in C++ programs. We furthermore identify behavioral characteristics of C++ programs that suggest optimizations that should be applied in those programs. Our results show that C++ programs exhibit behavior that is significantly different than C programs. These results should be of interest to compiler writers and architecture designers who are designing systems to execute object-oriented programs.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Donald J. Hatfield,et al.  Program Restructuring for Virtual Memory , 1971, IBM Syst. J..

[3]  References , 1971 .

[4]  Donald E. Knuth,et al.  An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..

[5]  Jean-Loup Baer,et al.  Segmentation and optimization of programs from cyclic structure analysis , 1899, AFIPS '72 (Spring).

[6]  Domenico Ferrari,et al.  Improving locality by critical working sets , 1974, CACM.

[7]  David B. Wortman,et al.  Static and Dynamic Characteristics of XPL Programs , 1975, Computer.

[8]  James L. Elshoff,et al.  A numerical profile of commercial PL/I programs , 1976, Softw. Pract. Exp..

[9]  Andrew S. Tanenbaum,et al.  Implications of structured programming for machine architecture , 1978, CACM.

[10]  John Cocke,et al.  Measurement of Programming Improvement Algorithms , 1980, IFIP Congress.

[11]  Duncan H. Lawrie,et al.  On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.

[12]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[13]  Gene McDaniel An analysis of a mesa instruction set using dynamic instruction frequencies , 1982, ASPLOS I.

[14]  Cheryl A. Wiecek,et al.  A case study of VAX-11 instruction set usage for compiler execution , 1982, ASPLOS I.

[15]  Henry M. Levy,et al.  Measurement and analysis of instruction use in the VAX-11/780 , 1982, ISCA.

[16]  David R. Ditzel,et al.  Register allocation for free: The C machine stack cache , 1982, ASPLOS I.

[17]  Richard E. Sweet,et al.  Empirical analysis of the mesa instruction set , 1982, ASPLOS I.

[18]  Reinhold Weicker,et al.  Dhrystone: a synthetic systems programming benchmark , 1984, CACM.

[19]  Emmanuel Katevenis,et al.  Reduced instruction set computer architectures for VLSI , 1984 .

[20]  S. McFarling,et al.  Reducing the cost of branches , 1986, ISCA '86.

[21]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[22]  David W. Wall,et al.  The Mahler experience: using an intermediate language as the machine description , 1987, ASPLOS.

[23]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .

[24]  Roy H. Campbell,et al.  The Design of a Multiprocessor Operating System , 1987, C++ Workshop.

[25]  David W. Wall,et al.  The Mahler experience: using an intermediate language as the machine description , 1987, International Conference on Architectural Support for Programming Languages and Operating Systems.

[26]  Stephen J. Hartley Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems , 1988, IEEE Trans. Software Eng..

[27]  D. Lilja Reducing the Branch Penalty in Pipelined Processors , 1988, Computer.

[28]  D. J. Lalja Reducing the branch penalty in pipelined processors , 1988, Computer.

[29]  Paul R. Calder,et al.  Composing user interfaces with InterViews , 1989, Computer.

[30]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[31]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[32]  Peter Steenkiste,et al.  A simple interprocedural register allocation algorithm and its effectiveness for LISP , 1989, TOPL.

[33]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[34]  Craig Chambers,et al.  Customization: optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language , 1989, PLDI '89.

[35]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the integer SPEC benchmarks on a RISC , 1990, CARN.

[36]  David S. Rosenblum,et al.  Representing Semantically Analyzed C++ Code with Reprise , 1991, C++ Conference.

[37]  David R. Kaeli,et al.  Branch History Table Prediction of Moving Target Branches due to Subroutine Returns , 1991, ISCA.

[38]  David R. Ditzel,et al.  An analysis of MIPS and SPARC instruction set utilization on the SPEC benchmarks , 1991, ASPLOS IV.

[39]  Kevin O'Brien,et al.  Performance characteristics of architectural features of the IBM RISC System/6000 , 1991, ASPLOS IV.

[40]  D.R. Kaeli,et al.  Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[41]  Keith D. Cooper,et al.  An experiment with inline substitution , 1991, Softw. Pract. Exp..

[42]  Jack W. Davidson,et al.  Subprogram Inlining: A Study of its Effects on Program Execution Time , 1992, IEEE Trans. Software Eng..

[43]  Amitabh Srivastava,et al.  Unreachable procedures in object-oriented programming , 1992, LOPL.

[44]  Keith D. Cooper,et al.  Unexpected side effects of inline substitution: a case study , 1992, LOPL.

[45]  Michael D. Smith,et al.  Efficient superscalar performance through boosting , 1992, ASPLOS V.

[46]  William E. Weihl,et al.  Register relocation: flexible contexts for multithreading , 1993, ISCA '93.

[47]  Dirk Grunwald,et al.  Customalloc: Efficient synthesized memory allocators , 1993, Softw. Pract. Exp..

[48]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[49]  D. W. Walker,et al.  LAPACK++: a design overview of object-oriented extensions for high performance linear algebra , 1993, Supercomputing '93.

[50]  James R. Larus,et al.  Efficient program tracing , 1993, Computer.

[51]  Benjamin G. Zorn,et al.  The measured cost of conservative garbage collection , 1993, Softw. Pract. Exp..

[52]  Benjamin G. Zorn,et al.  Memory allocation costs in large C and C++ programs , 1994, Softw. Pract. Exp..

[53]  A. Eustace,et al.  ATOM: a system for building customized program analysis tools , 1994, PLDI '94.

[54]  Dirk Grunwald,et al.  Reducing branch costs via branch alignment , 1994, ASPLOS VI.

[55]  James R. Larus,et al.  Rewriting executable files to measure program behavior , 1994, Softw. Pract. Exp..

[56]  Dirk Grunwald,et al.  Reducing indirect function call overhead in C++ programs , 1994, POPL '94.

[57]  Barbara G. Ryder,et al.  Static Type Determination for C++ , 1994, C++ Conference.

[58]  Dirk Grunwald,et al.  Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.

[59]  Dirk Grunwald,et al.  Evaluating models of memory allocation , 1994, TOMC.