A Decoupled Architecture of Processors with Scratch-Pad Memory Hierarchy

This paper present a decoupled architecture of processors with a memory hierarchy of only scratch-pad memories, and a main memory. The decoupled architecture also exploits the parallelism between address computation and processing the application data. The application code is split in two programs the first for computing the addresses of the data in the memory hierarchy and the second for processing the application data. The first program is executed by one of the decoupled processors called Access which uses compiler methods for placing data in the memory hierarchy. In parallel, the second program is executed by the other processor called Execute. The synchronization of the memory hierarchy and the Execute processor is achieved through simple handshake protocol. The Access processor requires strong communication with the memory hierarchy which strongly differentiates it from traditional uniprocessors. The architecture is compared in performance with the MIPS IV architecture of SimpleScalar and with the existing decoupled architectures showing its higher normalized performance. Experimental results show that the performance is increased up to 3.7 times. Compared with MIPS IV the proposed architecture achieves the above performance with insignificant overheads in terms of area

[1]  Theo Ungerer,et al.  Transistor count and chip-space estimation of simplescalar-based microprocessor model , 2001 .

[2]  Mahmut T. Kandemir,et al.  A compiler-based approach for dynamically managing scratch-pad memories in embedded systems , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.

[4]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  Lizy Kurian John,et al.  MediaBreeze: a decoupled architecture for accelerating multimedia applications , 2001, CARN.

[6]  Al Davis,et al.  A loop accelerator for low power embedded VLIW processors , 2004, CODES+ISSS '04.

[7]  Erik Brockmeyer,et al.  A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[9]  Norman P. Jouppi,et al.  An Integrated Cache Timing and Power Model , 2002 .

[10]  James E. Smith Decoupled access/execute architectures , 1982, ISCA 1982.

[11]  Lizy Kurian John,et al.  Memory Latency Effects in Decoupled Architectures , 1994, IEEE Trans. Computers.

[12]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..