An on-chip multiprocessor architecture with a non-blocking synchronization mechanism

The growth of perception that the superscalar approach is reaching its limits drives studies of on-chip multiprocessor (MP) architectures as the alternative. This paper proposes a new MP architecture, called SKY: which efficiently exploits thread-level parallelism using register-value communication and synchronization. The most distinctive feature of SKY from previously proposed MP architectures is its synchronization mechanism with non-blocking capability. It allows any subsequent instruction that is independent of instructions waiting for registers to be executed, enabling continuous out-of-order execution independently of inter-thread communication and synchronization. Our evaluation results in SPECint95 benchmark programs show that SKY with two processors achieves a speedup of up to 40% or an average of 12% over a much more complex single wide-issue superscalar processor with the nearly same amount of hardware.

[1]  H. Ando,et al.  Performance Comparison of ILP Machines with Cycle Time Evaluation , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[2]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[3]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[4]  Kunle Olukotun,et al.  Evaluation of Design Alternatives for a Multiprocessor Microprocessor , 1996, ISCA.

[5]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[6]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[7]  William J. Dally,et al.  Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[8]  Gurindar S. Sohi,et al.  The anatomy of the register file in a multiscalar processor , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[9]  Hideki Ando,et al.  Performance Comparison of ILP Machines with Cycle Time Evaluation , 1996, ISCA.

[10]  Michael D. Smith,et al.  Tracing with Pixie , 1991 .

[11]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[12]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, ISCA.

[13]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[14]  Yale N. Patt,et al.  Alternative Implementations of Two-Level Adaptive Branch Prediction , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[15]  Michael D. Smith,et al.  Efficient superscalar performance through boosting , 1992, ASPLOS V.

[16]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.