Optimization of VLIW compatibility systems employing dynamic rescheduling

Lack of object code compatibility in VLIW architectures is a severe limit to their adoption as a general-purpose computing paradigm. Previous approaches include hardware and software techniques, both of which have drawbacks. Hardware techniques add to the complexity of the architecture, whereas software techniques require multiple executables. This paper presents a technique called Dynamic Rescheduling that applies software techniques dynamically, using intervention by the OS: at each first-time page fault, the page of code is rescheduled for the new generation, if required. Results are presented to demonstrate the viability of the technique using the Illinois IMPACT compiler and the TINKER architectural framework. For the machine models and the workloads used in this study, performance of the rescheduled code compares well with the native scheduled code for a machine. The behavior of a subset of programs in the workload is such that they face a large number of first-time page faults. Due to this, their rescheduling overhead is higher relative to their total execution time. Such programs are calledhigh-overhead programs. Caching of translated pages across multiple invocations of the program to reduce the rescheduling overhead, using apersistent rescheduled-page cache (PRC)(1) is discussed. It was found that for the workload used in this evaluation, a PRC of size between 512 to 1024 pages, and which uses anoverhead-based page replacement policy would be effective in reducing the overhead.

[1]  Todd M. Austin,et al.  Zero-cycle loads: microarchitecture support for reducing load latency , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[2]  Kemal Ebcioglu,et al.  An architectural framework for supporting heterogeneous instruction-set architectures , 1993, Computer.

[3]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[4]  Thomas Martin Conte,et al.  Systematic Computer Architecture Prototyping , 1992 .

[5]  Sumedh W. Sathaye,et al.  Instruction fetch mechanisms for VLIW architectures with compressed encodings , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[6]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[7]  Sumedh W. Sathaye,et al.  A technique for object code compatibility in VLIW architectures , 1995, MICRO 1995.

[8]  Scott A. Mahlke,et al.  IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors , 1998, 25 Years ISCA: Retrospectives and Reprints.

[9]  B. Ramakrishna Rau,et al.  Dynamically scheduled VLIW processors , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[10]  Vinod Kathail,et al.  Techniques for critical path reduction of scalar programs , 2007, International Journal of Parallel Programming.

[11]  Shlomo Weiss,et al.  POWER and PowerPC , 1994 .

[12]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[13]  Sumedh W. Sathaye,et al.  Dynamic rescheduling: a technique for object code compatibility in VLIW architectures , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[14]  Sumedh W. Sathaye,et al.  A persistent rescheduled-page cache for low overhead object code compatibility in VLIW architectures , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[15]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[16]  Manoj Franklin,et al.  A fill-unit approach to multiple instruction issue , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[17]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[18]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[19]  Richard L. Sites,et al.  Binary translation , 1993, CACM.

[20]  Richard L. Sites,et al.  Binary translation : Digital's alpha chip project , 1993 .

[21]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[22]  Yale N. Patt,et al.  Hardware Support For Large Atomic Units in Dynamically Scheduled Machines , 1988, [1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21.