Instruction cache design space exploration for embedded software applications

Embedded processors with cache memories are used to improve the overall performance of the system. To maintain a trade-off between cache size costs vs. performance, it is required to avoid oversize cache. A quick estimation of cache size at the early stage of design cycle may help the system architect to plan the available chip area among processing core, cache memory, register file and other system components. There are two general approaches to explore the design space. One is the exhaustive simulation and the other is analytical modeling. In general, at a given abstraction level, the former is very time consuming whereas the latter often lacks the accuracy. This paper discusses a hybrid approach, combining analysis and simulation for determining the “best” cache size, beyond which performance penalties are very high, for embedded applications. A rapid technique, based on static analysis followed by simulation of the restricted design space of L1 instruction cache size for embedded application has been proposed in this paper. The technique helps in confining the search space from 20% to 30% of the total search space for different applications of MiBench benchmark suite.

[1]  Rajendra Patel,et al.  Dominant block guided optimal cache size estimation to maximize IPC of embedded software , 2013, ArXiv.

[2]  Jörg Henkel,et al.  Avalanche: an environment for design space exploration and optimization of low-power embedded systems , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[3]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[4]  Rudolf Eigenmann,et al.  Compiler Infrastructure , 2013, International Journal of Parallel Programming.

[5]  Xianfeng Li,et al.  Design space exploration of caches using compressed traces , 2004, ICS '04.

[6]  Sri Parameswaran,et al.  Finding optimal L1 cache configuration for embedded systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[7]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  P. Faraboschi,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Arijit Ghosh,et al.  Analytical design space exploration of caches for embedded systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[10]  David B. Whalley,et al.  Fast, accurate design space exploration of embedded systems memory configurations , 2007, SAC '07.

[11]  Mehdi Alipour,et al.  Multi objective design space exploration of cache for embedded applications , 2012, 2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[12]  Bing Peng,et al.  An Interpolative Analytical Cache Model with Application to Performance-Power Design Space Exploration , 2005 .

[13]  Ben Lee,et al.  X32V: A Design of Configurable Processor Core for Embedded Systems , 2004, ESA/VLSI.

[14]  Mehdi Alipour,et al.  Cache power and performance tradeoffs for embedded applications , 2011, 2011 IEEE International Conference on Computer Applications and Industrial Electronics (ICCAIE).

[15]  Yao Yingbiao,et al.  Fast, Accurate On-Chip Data Memory Performance Estimation , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.

[16]  Mehdi Alipour,et al.  Design Space Exploration to Find the Optimum Cache and Register File Size for Embedded Applications , 2012, ArXiv.

[17]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.