Embedded intelligent SRAM

Many embedded systems use a simple pipelined RISC processor for computation and an on-chip SRAM for data storage. We present an enhancement called Intelligent SRAM (ISRAM) that consists of a small computation unit with an accumulator that is placed near the on-chip SRAM. The computation unit can perform operations on two words from the same SRAM row or on one word from the SRAM and the other from the accumulator. This ISRAM enhancement requires only a few additional instructions to support the computation unit. We present a computation partitioning algorithm that assigns the computations to the processor or to the new computation unit for a given data flow graph of a program. Performance improvement results from the reduction in the number of accesses to the SRAM, the number of instructions, and the number of pipeline stalls compared to the same operations in the processor. Experimental results on various benchmarks show up to 1.46X speedup with our enhancement.

[1]  R. Leupers,et al.  Optimized address assignment for DSPs with SIMD memory accesses , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[2]  Rainer Leupers,et al.  Variable partitioning for dual memory bank DSPs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Seth Copen Goldstein,et al.  PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.

[4]  Bruce F. Cockburn,et al.  DSP-RAM: A logic-enhanced memory architecture for communication signal processing , 1999, 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No.99CH36368).

[5]  Lizy Kurian John,et al.  Cost-effective hardware acceleration of multimedia applications , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[6]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[7]  Hedley Francis,et al.  ARM DSP-Enhanced Exten-sions , 2001 .

[8]  Amit Rao,et al.  Storage assignment optimizations to generate compact and efficient code on embedded DSPs , 1999, PLDI '99.

[9]  Noah Treuhaft,et al.  Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.

[10]  M. Oskin,et al.  Active Pages: a computation model for intelligent memory , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[11]  Sharad Malik,et al.  Memory bank and register allocation in software synthesis for ASIPs , 1995, Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[12]  Corinna G. Lee,et al.  Simple vector microprocessors for multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.