论文信息 - Evaluation of scheduling techniques on a SPARC-based VLIW testbed

Evaluation of scheduling techniques on a SPARC-based VLIW testbed

The performance of Very Long Instruction Word (VLIW) microprocessors depends on the close cooperation between the compiler and the architecture. This paper evaluates a set of important compilation techniques and related architectural features for VLIW machines. The evaluation is performed on a SPARC-based VLIW testbed where gcc-generated optimized SPARC code is scheduled into high-performance VLIW code. As a base scheduling compiler, we experiment with three core scheduling techniques including enhanced pipeline scheduling, all-path speculation, and renaming. We analyze the characteristics of the useful and useless ALUs in each cycle to see how many of those ALUs execute non-speculative operations, speculative operations, and copies, respectively. Then, we evaluate the following compilation techniques: software pipelining, loop unrolling, non-greedy enhanced pipeline scheduling, profile-based all-path speculation, trace-based speculation, renaming, restricted speculative loads, and memory disambiguation. Since we experiment on a uniform testbed based on a detailed analysis of ALUs, our evaluation provides an useful insight on the performance impact of these techniques.

[1] Soo-Mook Moon,et al. SPARC-based VLIW testbed , 1998 .

[2] Scott A. Mahlke,et al. IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, ISCA '91.

[3] Soo-Mook Moon,et al. Parallelizing nonnumerical code with selective scheduling and software pipelining , 1997, TOPL.

[4] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[5] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.

[6] Kemal Ebcioglu,et al. An architectural framework for supporting heterogeneous instruction-set architectures , 1993, Computer.

[7] Scott Mahlke,et al. Exploiting Instruction Level Parallelism in the Presence of Conditional Branches , 1997 .

[8] Wen-mei W. Hwu,et al. IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[9] Soo-Mook Moon,et al. Generalized Multiway Branch Unit for VLIW Microprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[10] Jack W. Davidson,et al. Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation , 1995, MICRO 1995.

[11] Toshio Nakatani,et al. Making Compaction-Based Parallelization Affordable , 1993, IEEE Trans. Parallel Distributed Syst..

[12] Wen-mei W. Hwu,et al. Modulo scheduling of loops in control-intensive non-numeric programs , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[13] Michael D. Smith,et al. Boosting beyond static scheduling in a superscalar processor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[14] Erik R. Altman,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[15] G. B. Steven,et al. Using a resource limited instruction scheduler to evaluate the iHARP processor , 1995 .

[16] K. Ebcioglu,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.