Speculative DMA for architecturally visible storage in instruction set extensions

Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS) - compiler-controlled memories, similar to scratchpads, that are accessible only to ISEs. To achieve a speedup using AVS, Direct Memory Access (DMA) transfers are required to move data from the main memory to the AVS; unfortunately, this creates coherence problems between the AVS and the cache, which previous methods for ISEs with AVS failed to address; additionally, these methods need to leave many conservative DMA transfers in place, whose execution significantly limits the achievable speedup. This paper presents a memory coherence scheme for ISEs with AVS, which can ensure execution correctness and memory consistency with minimal area overhead. We also present a method that speculatively removes redundant DMA transfers. Cycle-accurate experimental results were obtained using an FPGA-emulation platform. These results show that the application-specific instruction-set extended processors with speculative DMA-enhanced AVS gain significantly over previous techniques, despite the overhead of the coherence mechanism.

[1]  Nikil D. Dutt,et al.  Introduction of Architecturally Visible Storage in Instruction Set Extensions , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[3]  Paolo Ienne,et al.  Rethinking custom ISE identification: a new processor-agnostic method , 2007, CASES '07.

[4]  Janak H. Patel,et al.  A low-overhead coherence solution for multiprocessors with private cache memories , 1998, ISCA '98.

[5]  Paolo Ienne,et al.  Exact and approximate algorithms for the extension of embedded processor instruction sets , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Tulika Mitra,et al.  Scalable custom instructions identification for instruction-set extensible processors , 2004, CASES '04.

[7]  Luca Benini,et al.  Synthesis of application-specific memories for power optimization in embedded systems , 2000, Proceedings 37th Design Automation Conference.

[8]  Scott A. Mahlke,et al.  Automated custom instruction generation for domain-specific processor acceleration , 2005, IEEE Transactions on Computers.

[9]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[10]  Rainer Leupers,et al.  Customizable Embedded Processors: Design Technologies and Applications , 2006 .

[11]  Günhan Dündar,et al.  An integer linear programming approach for identifying instruction-set extensions , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).