论文信息 - An on-chip multiprocessor architecture with a non-blocking synchronization mechanism

An on-chip multiprocessor architecture with a non-blocking synchronization mechanism

The growth of perception that the superscalar approach is reaching its limits drives studies of on-chip multiprocessor (MP) architectures as the alternative. This paper proposes a new MP architecture, called SKY: which efficiently exploits thread-level parallelism using register-value communication and synchronization. The most distinctive feature of SKY from previously proposed MP architectures is its synchronization mechanism with non-blocking capability. It allows any subsequent instruction that is independent of instructions waiting for registers to be executed, enabling continuous out-of-order execution independently of inter-thread communication and synchronization. Our evaluation results in SPECint95 benchmark programs show that SKY with two processors achieves a speedup of up to 40% or an average of 12% over a much more complex single wide-issue superscalar processor with the nearly same amount of hardware.

[1] H. Ando,et al. Performance Comparison of ILP Machines with Cycle Time Evaluation , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[2] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[3] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[4] Kunle Olukotun,et al. Evaluation of Design Alternatives for a Multiprocessor Microprocessor , 1996, ISCA.

[5] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[6] Gerry Kane,et al. MIPS RISC Architecture , 1987 .

[7] William J. Dally,et al. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[8] Gurindar S. Sohi,et al. The anatomy of the register file in a multiscalar processor , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[9] Hideki Ando,et al. Performance Comparison of ILP Machines with Cycle Time Evaluation , 1996, ISCA.

[10] Michael D. Smith,et al. Tracing with Pixie , 1991 .

[11] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[12] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, ISCA.

[13] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.

[14] Yale N. Patt,et al. Alternative Implementations of Two-Level Adaptive Branch Prediction , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[15] Michael D. Smith,et al. Efficient superscalar performance through boosting , 1992, ASPLOS V.

[16] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.