论文信息 - Performance Comparison of ILP Machines with Cycle Time Evaluation

Performance Comparison of ILP Machines with Cycle Time Evaluation

execution reduces the number of cycles for program execution, the complicated hardware required for the mechanism imposes a cycle time penalty. Very long instruction word (VLIW) machines, on the other hand, only require simple hardware since the complexity of instruction scheduling is transferred to the compiler. Therefore, there is only a small cycle time penalty. A compiler is fundamentally better at instruction scheduling than a dynamic scheduler because of its large instruction window and sophisticated algorithms. Thus, VLIW machines have the potential to outperform superscalar machines. Yet, a simple VLIW has a serious weakness in speculative execution. That is, the compiler has only a limited ability to handle the side effects of speculative execution. This limitation greatly reduces the amount of ILP available for the compiler to exploit. Thus, it is a question whether a simple VLIW machine really outperforms a su-perscalar machine. Recent VLIW studies proposed hardware mechanisms to remove the restrictions imposed on compiler's scheduling (e.g. guarding [3] and boosting [11]). We recently proposed a mechanism called predicating [2] which provides the compiler with unconstrained speculative code motions. Although that paper reported great ILP improvement through the mechanism and the simplicity of the mechanism, it is unknown how much the hardware mechanism imposes a cycle time penalty, and how much the performance is improved as a result. This paper answers these questions. That is, we estimate the performance improvement of a superscalar machine, a simple VLIW machine, and our VLIW machine with predicating over a scalar machine by evaluating both the cycle count and the cycle time. We have built an instruction scheduler and simulators for the cycle count evaluation, and have designed critical hardware for the cycle time evaluation. Section 2 describes the three ILP architectures we evaluated. Section 3 discusses the complexity of each architecture. Section 4 describes evaluated machine models. Section 5 shows evaluation results of cycle counts, cycle time, and resultant performance. Finally, Section 6 concludes this paper. We evaluated three ILP machines: a simple VLIW machine, a VLIW machine with predicating, and a superscalar machine. The simple VLIW machine we evaluated exists in an extreme side in ILP machines in terms of simplicity. This machine may run with the fastest clock rate. Although the compiler schedules instructions, the amount of exploitable ILP is severely limited because of the limited hardware support for speculative execution. Thus, of the Abstract Many studies have investigated performance …

Hideki Ando | Masao Nakaya | Chikako Nakanishi | Tetsuya Hara

[1] Kazuaki Murakami,et al. SIMP (single Instruction Stream/multiple Instruction Pipelining): A Novel High-speed Single-processor Architecture , 1989, The 16th Annual International Symposium on Computer Architecture.

[2] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[3] Michael D. Smith,et al. Boosting beyond static scheduling in a superscalar processor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.