Design space exploration for an embedded processor with flexible datapath interconnect

The design of an embedded processor is dependent on the application domain. Traditionally, design solutions specific to an application domain have been available in three forms: VLIW-based DSP processors, ASICs and FPGAs; each respectively offering generality of application domain, energy efficiency and flexibility. However, while matching the application domain to the resources needed, the design space becomes huge. We present FlexTools, a tool framework built around the FlexCore architecture to evaluate performance and energy efficiency for different applications. Here we demonstrate FlexTools for design space exploration with a focus on the data-routing flexibility of the FlexCore processor, in search of energy-efficient interconnect configurations that are both cycle-count and hardware efficient. Evaluation results suggest that a well-optimized instance of a 65-nm multiplier-extended FlexCore processor datapath, obtained using FlexTools, executes nine integer EEMBC benchmarks with a 15% cycle count reduction and dissipates 17% less energy than a reference MIPS datapath.

[1]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[2]  Miodrag Potkonjak,et al.  Flexible ASIC: shared masking for multiple media processors , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[3]  Luciano Lavagno,et al.  Speeding-up heuristic allocation, scheduling and binding with SAT-based abstraction/refinement techniques , 2010, TODE.

[4]  Per Stenström,et al.  A Flexible Code Compression Scheme Using Partitioned Look-Up Tables , 2009, HiPEAC.

[5]  Tung Thanh Hoang,et al.  Double Throughput Multiply-Accumulate unit for FlexCore processor enhancements , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6]  Magnus Själander,et al.  Scheduling for an Embedded Architecture with a Flexible Datapath , 2009, 2009 IEEE Computer Society Annual Symposium on VLSI.

[7]  David A. Patterson,et al.  Computer organization and design (2nd ed.): the hardware/software interface , 1997 .

[8]  Seda Ogrenci Memik,et al.  Accelerated SAT-based scheduling of control/data flow graphs , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[9]  Daniel Gajski,et al.  C-based design flow: A case study on G.729A for Voice over internet protocol (VoIP) , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[10]  David A. Patterson,et al.  Computer Organization & Design: The Hardware/Software Interface , 1993 .

[11]  Magnus Själander,et al.  Multiplication Acceleration Through Twin Precision , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Wei Zhang,et al.  Reducing dynamic and leakage energy in VLIW architectures , 2006, TECS.

[13]  Magnus Själander,et al.  FlexCore: Utilizing Exposed Datapath Control for Efficient Computing , 2007, 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[14]  Y. N. Srikant,et al.  Compiler-assisted leakage energy optimization for clustered VLIW architectures , 2006, EMSOFT '06.

[15]  Daniel Gajski,et al.  Automatic architecture refinement techniques for customizing processing elements , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[16]  Diederik Verkest,et al.  Coffee: COmpiler Framework for Energy-Aware Exploration , 2008, HiPEAC.