Evolutionary Compilation to Long Instruction Superscalar Microarchitectures for Exploiting Parallelism At All Levels

VLIW seeks to perform all prediction before run-time, if possible. Superscalar hopes it can get reasonable performance even if the compiler disappears. Let's assume the solution includes dynamic scheduling hardware (superscalar), but with explicit parallelism expressed in the ISA (i.e., an EPIC ISA). The compiler's predictions via profiling are input dependent, and are assumed to hold true for the entire life of the program. Common sense tells us that programs are used differently even by the same user (say as the user transitions to a "power user"). The hardware predictions are very accurate, but the potential parallelism exploited by them is limited to a small distance (the size of the hardware window). So the compiler has the right scope (large), but poor prediction. The hardware has great prediction, but terrible scope. There has been a war raging between the two approaches. Instead of warring, it is tempting to try to combine the two. I propose that we add some levels to the current, rather stunted hierarchy of parallelism exploitation.

[1]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[2]  Thomas M. Conte,et al.  Accurate and practical profile-driven compilation using the profile buffer , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[3]  Burzin A. Patel,et al.  Using branch handling hardware to support profile-driven optimization , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Sumedh W. Sathaye,et al.  Dynamic rescheduling: a technique for object code compatibility in VLIW architectures , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[5]  Brian N. Bershad,et al.  Fast, effective dynamic compilation , 1996, PLDI '96.

[6]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[7]  Sumedh W. Sathaye,et al.  MPS: Miss-Path Scheduling for Multiple-Issue Processors , 1998, IEEE Trans. Computers.

[8]  Kishore N. Menezes Hardware-based profiling for program optimization , 1997 .

[9]  R. Nair,et al.  Exploiting Instruction Level Parallelism In Processors By Caching Scheduled Groups , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.