Hyperblock performance optimizations for ILP processors

iii ACKNOWLEDGEMENTS I would rst like to thank my advisor, Professor Wen-mei Hwu, for his guidance and support. Credit is due to the IMPACT group for creating such a useful framework upon which this work was built. I also wish to acknowledge the IMPACT group for graciously ooering assistance in many forms. I am especially indebted to Scott Mahlke. His advice, guidance, and assistance were extremely valuable. John Gyllenhaal kindly enhanced the IMPACT prooler at my request. The development and testing that this involved is greatly appreciated. The OOce of Naval Research and the University of Illinois provided nancial support through fellowships. I am ever grateful to my family for their love, encouragement, and support. Thank you Dad for being a role model and for sparking my interest in this eld as early as 1976. Thank you Mom for, as you frequently remind me, teaching me everything you know about computers. Also, thank you for the insight, guidance, and values by which I live. Finally, to Kathy, whose love and strength were vital, and to Murphy, whose loyal companionship during many late nights kept me human, thank you.

[1]  erDavid,et al.  Dynamic Memory Disambiguation Using the Memory Con ict Bu er , 1994 .

[2]  Sadun Anik,et al.  Architectural and Software Support for Executing Numerical Applications on High Performance Computers , 1993 .

[3]  Po-Hua Chang,et al.  Compiler support for multiple-instruction-issue architectures , 1991 .

[4]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[5]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[6]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[7]  John Paul Shen,et al.  An instruction-level performance analysis of the Multiflow TRACE 14/300 , 1991, MICRO 24.

[8]  Vinod Kathail,et al.  Height reduction of control recurrences for ILP processors , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[9]  Norman P. Jouppi,et al.  Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS III.

[10]  Scott A. Mahlke,et al.  Profile‐guided automatic inline expansion for C programs , 1992, Softw. Pract. Exp..

[11]  Scott A. Mahlke,et al.  A comparison of full and partial predicated execution support for ILP processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[12]  David C. Lin Compiler Support For Predicated Execution In Superscalar Processors , 1992 .

[13]  R. A. Towle,et al.  Control and data dependence for program transformations. , 1976 .

[14]  Roger A. Bringmann A TEMPLATE FOR CODE GENERATOR DEVELOPMENT USING THE IMPACT-I C COMPILER , 1992 .

[15]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[16]  Edward S. Davidson,et al.  Highly concurrent scalar processing , 1986, ISCA 1986.

[17]  Michael D. Smith,et al.  Limits on multiple instruction issue , 1989, ASPLOS III.

[18]  Wen-mei W. Hwu,et al.  Achieving High Instruction Cache Performance With An Optimizing Compiler , 1989, The 16th Annual International Symposium on Computer Architecture.

[19]  Yoji Yamada,et al.  Data relocation and prefetching for programs with large data sets , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  Richard E. Hank,et al.  Machine Independent Register Allocation For The Impact-I C Compiler , 1993 .

[21]  David Mark Gallagher,et al.  Memory disambiguation to facilitate instruction-level parallelism compilation , 1995 .

[22]  B. R. Rau,et al.  The Cydra 5 Departmental Supercomputer: design philosophies, decisions and trade-offs , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.

[23]  Richard E. Hank,et al.  Region-based compilation: an introduction and motivation , 1995, MICRO 1995.

[24]  Scott Mahlke,et al.  Design And Implementation Of A Portable Global Code Optimizer , 1991 .

[25]  Scott A. Mahlke,et al.  The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors , 1995, IEEE Trans. Computers.

[26]  Jr. William Yu-Wei Chen,et al.  Data preload for superscalar and VLIW processors , 1993 .