The Challenges of Massive On-Chip Concurrency

Moore’s law describes the growth in on-chip transistor density, which doubles every 18 to 24 months and looks set to continue for at least a decade and possibly longer. This growth poses major problems (and provides opportunities) for computer architecture in this time frame. The problems arise from current architectural approaches, which do not scale well and have used clock speed rather than concurrency to increase performance. This, in turn, causes excessive power dissipation and circuit complexity. This paper takes a long-range position on the future of chip multiprocessors, both from the micro-architecture perspective, as well as from a systems perspective. Concurrency will come from many levels, with instruction and loop-level concurrency managed at the micro-architecture and higher levels by the system. Chip-level multiprocessors exploiting massive concurrency we term Microgrids. The directions proposed in this paper provide micro-architectural concurrency with full forward compatibility over orders of magnitude of scaling and also the management of on-chip resources (processors etc.) so as to autonomously configure a system for a variety of goals (e.g. low power, high performance, etc.).

[1]  Avi Mendelson,et al.  Coming challenges in microarchitecture and architecture , 2001, Proc. IEEE.

[2]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[3]  A. Schuster,et al.  Intrathreads : Techniques for Parallelizing Sequential Code , 2022 .

[4]  Rinaldo Castello,et al.  Implementation of a CMOS LNA plus mixer for GPS applications with no external components , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Diana Marculescu Profile-driven code execution for low power dissipation , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[6]  Alexander V. Shafarenko Stream Processing on the Grid: an Array Stream Transforming Language , 2003, SNPD.

[7]  C. R. Jesshope,et al.  Dynamic scheduling in RISC architectures , 1996 .

[8]  Victor V. Zyuban,et al.  Optimization of high-performance superscalar architectures for energy efficiency , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[9]  Chris R. Jesshope,et al.  Micro-threading: a new approach to future RISC , 2000, Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512).

[10]  Peter M. Kogge,et al.  A low cost, multithreaded processing-in-memory system , 2004, WMPI '04.

[11]  Dirk Grunwald,et al.  Using IPC Variation in Workloads with Externally Specified R ates to Reduce Power Consumption , 2000 .

[12]  Chris R. Jesshope Multi-threaded Microprocessors - Evolution or Revolution , 2003, Asia-Pacific Computer Systems Architecture Conference.

[13]  Diana Marculescu Profile-driven code execution for low power dissipation (poster session) , 2000, ISLPED '00.

[14]  Alex Settle,et al.  Compiler-directed resource management for active code regions , 2003, Seventh Workshop on Interaction Between Compilers and Computer Architectures, 2003. INTERACT-7 2003. Proceedings..

[15]  Chris R. Jesshope,et al.  Performance of a micro-threaded pipeline , 2002 .

[16]  Chris R. Jesshope Scalable Instruction-Level Parallelism , 2004, SAMOS.

[17]  Alexander V. Shafarenko,et al.  General Homomorphic Overloading , 2004, IFL.

[18]  Ian Foster,et al.  A quality of service architecture that combines resource reservation and application adaptation , 2000, 2000 Eighth International Workshop on Quality of Service. IWQoS 2000 (Cat. No.00EX400).

[19]  Israel Koren,et al.  Combining compiler and runtime IPC predictions to reduce energy in next generation architectures , 2004, CF '04.

[20]  Diana Marculescu,et al.  Power aware microarchitecture resource scaling , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[21]  Mark Homewood,et al.  The IMS T800 Transputer , 1987, IEEE Micro.

[22]  Stamatis Vassiliadis,et al.  Computer Systems: Architectures, Modeling, and Simulation , 2004, Lecture Notes in Computer Science.

[23]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[24]  K. Ghose,et al.  Energy-efficient instruction dispatch buffer design for superscalar processors , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[25]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[26]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[27]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[28]  Gürhan Küçük,et al.  Energy: efficient instruction dispatch buffer design for superscalar processors , 2001, ISLPED '01.

[29]  Chris R. Jesshope Microgrids - The exploitation of massive on-chip concurrency , 2004, High Performance Computing Workshop.

[30]  Chris R. Jesshope Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines , 2001, Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001.

[31]  Ralph H. J. M. Otten,et al.  Challenges in physical chip design , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).