Implementation-aware selection of the custom instruction set for extensible processors

Abstract This paper presents an approach for incorporating the effect of various logic synthesis options and logic level implementations into the custom instruction (CI) selection for extensible processors. This effect translates into the availability of a piecewise continuous spectrum of delay versus area choices for each CI, which in turn influences the selection of the CI set that maximizes the speedup per area cost (SPA) metric. The effectiveness of the proposed approach is evaluated by applying it to several benchmarks and comparing the results with those of a conventional technique. We also apply the methodology to the existing serialization algorithms aimed at relaxing register file constraints in multi-cycle custom instruction design. The comparison shows considerable improvements in the speedup per area compared to the custom instruction selection algorithms under the same area-budget constraint.

[1]  Wayne Luk,et al.  FISH: Fast Instruction SyntHesis for Custom Processors , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Scott A. Mahlke,et al.  Automated custom instruction generation for domain-specific processor acceleration , 2005, IEEE Transactions on Computers.

[3]  Kingshuk Karuri,et al.  A design flow for configurable embedded processors based on optimized instruction set extension synthesis , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[4]  Paolo Ienne,et al.  Exact and approximate algorithms for the extension of embedded processor instruction sets , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Nikil D. Dutt,et al.  ISEGEN: an iterative improvement-based ISE generation technique for fast customization of processors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Muhammad Shafique,et al.  Efficient Resource Utilization for an Extensible Processor Through Dynamic Instruction Set Adaptation , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Tulika Mitra,et al.  Evaluating design trade-offs in customizable processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[8]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[9]  Tao Li,et al.  Fast identification algorithm for application-specific instruction-set extensions , 2008, 2008 International Conference on Electronic Design.

[10]  Douglas L. Maskell,et al.  Fast Identification of Custom Instructions for Extensible Processors , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Philippe Coussy,et al.  High-Level Synthesis: from Algorithm to Digital Circuit , 2008 .

[12]  Nadia Nedjah,et al.  Modern development methods and tools for embedded reconfigurable systems: A survey , 2010, Integr..

[13]  Paolo Bonzini,et al.  Polynomial-Time Subgraph Enumeration for Automated Instruction Set Extension , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[14]  Yi Zhu,et al.  Arithmetic optimization for custom instruction set synthesis , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[15]  Sied Mehdi Fakhraie,et al.  Locality considerations in exploring custom instruction selection algorithms , 2010, 2nd Asia Symposium on Quality Electronic Design (ASQED).

[16]  Hai Lin,et al.  Resource sharing of pipelined custom hardware extension for energy-efficient application-specific instruction set processor design , 2009, 2009 IEEE International Conference on Computer Design.

[17]  Thambipillai Srikanthan,et al.  Selecting Profitable Custom Instructions for Area–Time-Efficient Realization on Reconfigurable Architectures , 2009, IEEE Transactions on Industrial Electronics.

[18]  Koen Bertels,et al.  The Instruction-Set Extension Problem: A Survey , 2008, ARC.

[19]  Wayne Luk,et al.  CHIPS: Custom Hardware Instruction Processor Synthesis , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Masahiro Fujita,et al.  Custom Instruction Generation with High-Level Synthesis , 2008, 2008 Symposium on Application Specific Processors.

[21]  Tulika Mitra,et al.  Scalable custom instructions identification for instruction-set extensible processors , 2004, CASES '04.

[22]  Sied Mehdi Fakhraie,et al.  Customized pipeline and instruction set architecture for embedded processing engines , 2013, The Journal of Supercomputing.

[23]  Edoardo Charbon,et al.  Optically-Clocked Instruction Set Extensions for High Efficiency Embedded Processors , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[24]  Scott A. Mahlke,et al.  Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[25]  Kingshuk Karuri,et al.  A Design Flow for Architecture Exploration and Implementation of Partially Reconfigurable Processors , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Kingshuk Karuri,et al.  Fine-grained application source code profiling for ASIP design , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[27]  Paolo Bonzini,et al.  Polynomial-time subgraph enumeration for automated instruction set extension , 2007 .

[28]  Hamid Noori,et al.  Energy-aware design space exploration of registerfile for extensible processors , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[29]  Kingshuk Karuri,et al.  Increasing data-bandwidth to instruction-set extensions through register clustering , 2007, ICCAD 2007.

[30]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[31]  Scott A. Mahlke,et al.  Exploring the design space of LUT-based transparent accelerators , 2005, CASES '05.

[32]  Tilman Wolf,et al.  PacketBench: a tool for workload characterization of network processing , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[33]  Sharad Malik,et al.  From ASIC to ASIP: the next design discontinuity , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[34]  Tao Li,et al.  Efficient Heuristic Algorithm for Rapid Custom-Instruction Selection , 2009, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.