Smart Memories: a modular reconfigurable architecture

Trends in VLSI technology scaling demand that future computing devices be narrowly focused to achieve high performance and high efficiency, yet also target the high volumes and low costs of widely applicable general purpose designs. To address these conflicting requirements, we propose a modular reconfigurable architecture called Smart Memories, targeted at computing needs in the 0.1μ technology generation. A Smart Memories chip is made up of many processing tiles, each containing local memory, local interconnect, and a processor core. For efficient computation under a wide class of possible applications, the memories, the wires, and the computational model can all be altered to match the applications. To show the applicability of this design, two very different machines at opposite ends of the architectural spectrum, the Imagine stream processor and the Hydra speculative multiprocessor, are mapped onto the Smart Memories computing substrate. Simulations of the mappings show that the Smart Memories architecture can successfully map these architectures with only modest performance degradation.

[1]  H. Zhang,et al.  A 1 V heterogeneous reconfigurable processor IC for baseband wireless applications , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[2]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[3]  Uri C. Weiser,et al.  Intel MMX for multimedia PCs , 1997, Commun. ACM.

[4]  Bharadwaj Amrutur Design And Analysis Of Fast Low Power Srams , 1999 .

[5]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[6]  Ralph Wittig,et al.  OneChip: an FPGA processor with reconfigurable logic , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[7]  Noah Treuhaft,et al.  Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.

[8]  Doug Matzke,et al.  Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.

[9]  A. DeHon,et al.  Trends toward spatial computing architectures , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).

[10]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[11]  E. Sackinger,et al.  A 3.2 GOPS multiprocessor DSP for communication applications , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[12]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[13]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[14]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[15]  Marc Tremblay,et al.  VIS speeds new media processing , 1996, IEEE Micro.

[16]  Arun K. Somani,et al.  A reconfigurable multi-function computing cache architecture , 2000, FPGA '00.

[17]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[18]  André DeHon,et al.  DPGA Utilization and Application , 1996, Fourth International ACM Symposium on Field-Programmable Gate Arrays.

[19]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[20]  Katherine Yelick,et al.  SCALABLE PROCESSORS IN THE BILLION-TRANSISTOR THE BILLION-TRANSISTOR ERA :IRAM , 1997 .

[21]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[22]  André DeHon,et al.  MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[23]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[24]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[25]  Paul Kalapathy Hardware-software interactions on Mpact , 1997, IEEE Micro.

[26]  Pradeep K. Dubey,et al.  How Multimedia Workloads Will Change Processor Design , 1997, Computer.

[27]  Kunle Olukotun,et al.  Improving the performance of speculatively parallel applications on the Hydra CMP , 1999 .

[28]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[29]  Norman P. Jouppi,et al.  How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[30]  Henk A. Dijkstra,et al.  The trimedia tm-1 pci vliw media processor , 1996 .

[31]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.