论文信息 - Performance Comparison of ILP Machines with Cycle Time Evaluation

Performance Comparison of ILP Machines with Cycle Time Evaluation

Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate performance improvement using the number of cycles that are required to execute a program, but do not quantitatively estimate the penalty imposed on the cycle time from the architecture. Since the performance of a microprocessor must be measured by its execution time, a cycle time evaluation is required as well as a cycle count speedup evaluation. Currently, superscalar machines are widely accepted as the machines which achieve the highest performance. On the other hand, because of hardware simplicity and instruction scheduling sophistication, there is a perception that the next generation of microprocessors will be implemented with a VLIW architecture. A simple VLIW machine, however, has a serious weakness regarding speculative execution. Thus, it is a question whether a simple VLIW machine really outperforms a superscalar machine. We recently proposed a mechanism called predicating that supports speculative execution for the VLIW machine, and showed a significant cycle count speedup over a scalar machine. Although the mechanism is simple, it is unknown how much it imposes a penalty on the cycle time, and how much the performance is improved as a result. This paper evaluates both the cycle count speedup and the cycle time for three ILP machines: a superscalar machine, a simple VLIW machine, and the VLIW machine with predicating. The evaluation results show that the simple VLIW machine slightly outperforms the superscalar machine, while the VLIW machine with predicating achieves a significant speedup of 1.41x over the superscalar machine.

H. Ando | M. Nakaya | C. Nakanishi | T. Hara

[1] N. Irie,et al. SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture , 1989, ISCA '89.

[2] Gerry Kane,et al. MIPS RISC Architecture , 1987 .

[3] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[4] Andrew R. Pleszkun,et al. Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[5] Hideki Ando,et al. Unconstrained speculative execution with predicated state buffering , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[6] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[7] Michael D. Smith,et al. Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[8] Colin Camerer. Judgment and decision making, J. Frank Yates. Englewood Cliffs, New Jersey, Prentice-Hall inc. 1990 , 1991 .

[9] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[10] Peter Y.-T. Hsu,et al. Highly concurrent scalar processing , 1986, ISCA '86.

[11] Scott A. Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[12] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[13] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.