Hot cold optimization of large Windows/NT applications

A dynamic instruction trace often contains many unnecessary instructions that are required only by the unexecuted portion of the program. Hot-cold optimization (HCO) is a technique that realizes this performance opportunity. HCO uses profile information to partition each routine into frequently executed (hot) and infrequently executed (cold) parts. Unnecessary operations in the hot portion are removed and compensation code is added on transitions from hot to cold as needed. We evaluate HCO on a collection of large Windows/NT applications. HCO is most effective on the programs that are call intensive and have flat profiles, providing a 3-8% reduction in path length beyond conventional optimization.

[1]  Dirk Grunwald,et al.  Performance issues in correlated branch prediction schemes , 1995, MICRO 1995.

[2]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[3]  S. McFarling Program optimization for instruction caches , 1989, ASPLOS 1989.

[4]  Scott A. Mahlke,et al.  Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[5]  Vatsa Santhanam,et al.  Register allocation across procedure and module boundaries , 1990, PLDI '90.

[6]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[7]  David W. Wall,et al.  Global register allocation at link time , 1986, SIGPLAN '86.

[8]  Fred C. Chow Minimizing register usage penalty at procedure calls , 1988, PLDI '88.

[9]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[10]  Dirk Grunwald,et al.  Reducing branch costs via branch alignment , 1994, ASPLOS VI.

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[12]  Vasanth Bala,et al.  Efficient instruction scheduling using finite state automata , 1995, MICRO 1995.

[13]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[14]  Richard E. Hank,et al.  Region-based compilation , 1996 .

[15]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[16]  Kent D. Wilken,et al.  Optimal and Near‐optimal Global Register Allocation Using 0–1 Integer Programming , 1996, Softw. Pract. Exp..

[17]  Dirk Grunwald,et al.  The predictability of branches in libraries , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[18]  Kent D. Wilken,et al.  Optimal and Near-Optimal Global Register Allocation Using 0-1 Integer Programming , 1996, Software, Practice & Experience.

[19]  Linda S. Wilson,et al.  Delivering binary object modification tools for program tools for program analysis and optimization , 1996 .

[20]  Steven O. Hobbs,et al.  The GEM Optimizing Compiler System , 1992, Digit. Tech. J..

[21]  Bernhard Steffen,et al.  Partial dead code elimination , 1994, PLDI '94.

[22]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[23]  Charles N. Fischer,et al.  Minimum cost interprocedural register allocation , 1996, POPL '96.

[24]  David W. Wall,et al.  Link-time optimization of address calculation on a 64-bit architecture , 1994, PLDI '94.

[25]  Linda S. Wilson,et al.  Delivering Binary Object Modication Tools for Program Analysis and Optimization , 1996, Digit. Tech. J..