A SHARED MEMORY MODULE FOR AN ASYNCHRONOUS ARRAY OF SIMPLE PROCESSORS

The design of an asynchronously shared memory module for the AsAP platform is presented. AsAP consists of a 2-dimensional array of processing elements with limited memory resources. The memory module expands the storage capacity available to AsAP processors, enabling the mapping of applications with large working sets. The memory module described shares an 8 K-word SRAM among four processors, but can support a 64 K-word SRAM with no additional changes. The memory module is independently clocked, supports hardware address generation, mutual exclusion, and multiple addressing modes. Simultaneous access by different processors is arbitrated using a leastrecently-serviced priority scheme. A standard cell implementation of the memory module cycles at 555 MHz and occupies 1.2 mm2 in 0.18 μm CMOS.

[1]  B. Flachs,et al.  A streaming processing unit for a CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[2]  F. Gharsalli,et al.  Automatic generation of embedded memory wrapper for multiprocessor SoC , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[3]  Nikil D. Dutt,et al.  Access pattern-based memory and connectivity architecture exploration , 2003, TECS.

[4]  P. Marston,et al.  Designing asynchronous standby circuits for a low-power pager , 1999 .

[5]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[8]  Christoforos E. Kozyrakis,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, MICRO.

[9]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[10]  Nikil D. Dutt,et al.  Local memory exploration and optimization in embedded systems , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  Tom Kilburn,et al.  One-Level Storage System , 1962, IRE Trans. Electron. Comput..

[12]  P. Nilsson,et al.  A digitally controlled PLL for SoC applications , 2004, IEEE Journal of Solid-State Circuits.

[13]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[14]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[15]  Omar Sattari,et al.  FAST FOURIER TRANSFORMS ON A DISTRIBUTED DIGITAL SIGNAL PROCESSOR , 2004 .

[16]  Yuan-Hao Huang,et al.  A 1.1 G MAC/s sub-word-parallel digital signal processor for wireless communication applications , 2004 .

[17]  Bevan M. Baas,et al.  A parallel programmable energy-efficient architecture for computationally-intensive DSP systems , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[18]  K. Mai,et al.  Architecture and circuit techniques for a 1.1-GHz 16-kb reconfigurable memory in 0.18-/spl mu/m CMOS , 2005, IEEE Journal of Solid-State Circuits.

[19]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[20]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[21]  Nikil D. Dutt,et al.  Memory size estimation for multimedia applications , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[22]  Leonid Oliker,et al.  Memory-intensive benchmarks: IRAM vs. cache-based machines , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[23]  Herman H. Goldstine,et al.  Preliminary discussion of the logical design of an electronic computing instrument (1946) , 1989 .

[24]  J. Hart,et al.  Implementation of a 4/sup th/-generation 1.8GHz dual-core SPARC V9 microprocessor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[25]  Soonhoi Ha,et al.  Memory access pattern analysis and stream cache design for multimedia applications , 2003, ASP-DAC '03.

[26]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[27]  Nikil D. Dutt,et al.  On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems , 2000, TODE.

[28]  Ryan W. Apperson,et al.  A DUAL-CLOCK FIFO FOR THE RELIABLE TRANSFER OF HIGH-THROUGHPUT DATA BETWEEN UNRELATED CLOCK DOMAINS , 2004 .

[29]  T. Mudge,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[30]  S.F. Smith,et al.  An asynchronous GALS interface with applications , 2004, 2004 IEEE Workshop on Microelectronics and Electron Devices.

[31]  S. Tam,et al.  A 130-nm triple-V/sub t/ 9-MB third-level on-die cache for the 1.7-GHz Itanium/spl reg/ 2 processor , 2005, IEEE Journal of Solid-State Circuits.

[32]  M.J. Meeuwsen,et al.  A full-rate software implementation of an IEEE 802.11a compliant digital baseband transmitter , 2004, IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004..