Design Space Exploration for Memory Subsystems of VLIW Architectures

In this work we present a design space exploration of the memory subsystem of our configurable CoreVA VLIW architecture. The development of resource efficient processor architectures is based on a two-stage tool flow using a high-level processor specification as a reference. We evaluate several memory configurations like one memory port or two memory ports, as well as different write-miss-allocation modes. Applications ranging from LTE protocol stack over baseband processing up to cryptography and multimedia are evaluated in terms of execution time and energy efficiency. Analyses have shown that the application specific configuration of the memory subsystem can improve energy by up to 25%. Our environment allows the rapid profiling and evaluation of algorithms to choose the most efficient configuration.

[1]  Ulrich Rückert,et al.  Design Space Exploration for Resource Efficient VLIW-Processors , 2008 .

[2]  Alfred Menezes,et al.  Guide to Elliptic Curve Cryptography , 2004, Springer Professional Computing.

[3]  Trung A. Diep,et al.  EXPLORER: a retargetable and visualization-based trace-driven simulator for superscalar processors , 1993, MICRO 1993.

[4]  William Jalby,et al.  On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[5]  Andrew Wolfe,et al.  Two-ported cache alternatives for superscalar processors , 1993, MICRO 1993.

[6]  Mayank Gupta,et al.  Energy Based Design Space Exploration of Multiprocessor VLIW Architectures , 2007, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007).

[7]  Madhu Mutyam,et al.  Word-interleaved cache: an energy efficient data cache architecture , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[8]  Uwe Kastens,et al.  Feedback driven instruction-set extension , 2004, LCTES '04.

[9]  Kemal Ebcioglu,et al.  A study on the number of memory ports in multiple instruction issue machines , 1993, MICRO 1993.

[10]  Stamatis Vassiliadis,et al.  The TM3270 media-processor data cache , 2005, 2005 International Conference on Computer Design.

[11]  R. Lethin,et al.  How VLIW almost disappeared - and then proliferated , 2009, IEEE Solid-State Circuits Magazine.

[12]  Resve A. Saleh,et al.  Power-delay metrics revisited for 90 nm CMOS technology , 2005, Sixth international symposium on quality electronic design (isqed'05).

[13]  Ulrich Rückert,et al.  A Synchronization Method for Register Traces of Pipelined Processors , 2009, IESS.

[14]  Japheth Hossell,et al.  Automated data cache placement for embedded VLIW ASIPs , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[15]  Mario Porrmann,et al.  RAPTOR - A Scalable Platform for Rapid Prototyping and FPGA-based Cluster Computing , 2009, PARCO.

[16]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[17]  Mahmut T. Kandemir,et al.  A framework for energy estimation of VLIW architecture , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.