Yield and Speedup Improvements in Extensible Processors by Allocating Extra Cycles to Some Custom Instructions

In this article, we investigate the application of different techniques for mitigating the impact of process variations on the custom functional unit (CFU) of extensible processors. The techniques include using extra cycles for the CFU and extending the clock period for the extensible processor. The former technique is based on providing an extra clock cycle to those custom instructions (CIs) that have timing yields smaller than one. For this purpose, we make use of a lookup table (LUT) for each fabricated processor. Based on a post-fabrication analysis, the need for an extra clock cycle for some CIs is determined. Consequently, the CI timing violations are prevented, and all manufactured extensible processors will work with a predefined clock cycle time. To study the effect of the objective function (used during the CI selection phase) on the efficacy of the suggested architectural technique, we investigate three different objective functions. In the second technique, the clock period extension is used to guarantee a design yield of one. Our results demonstrate that combining both techniques helps increase the speedup achieved by the extensible processor. To assess the efficacies of the proposed methods, several benchmarks from different application domains are used. Results of the study reveal that the suggested techniques provide considerable improvements in the speedups of the extensible processors when compared to those of approaches that do not consider the impact of process variations.

[1]  PedramMassoud,et al.  Yield and Speedup Improvements in Extensible Processors by Allocating Extra Cycles to Some Custom Instructions , 2016 .

[2]  Tilman Wolf,et al.  PacketBench: a tool for workload characterization of network processing , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[3]  Mehdi Kamal,et al.  Considering the effect of process variations during the ISA extension design flow , 2013, Microprocess. Microsystems.

[4]  Shidhartha Das,et al.  Hybrid circuit and algorithmic timing error correction for low-power robust DSP accelerators , 2013, 2013 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[5]  Nayan V. Mujadiya Instruction scheduling for VLIW processors under variation scenario , 2009, 2009 International Symposium on Systems, Architectures, Modeling, and Simulation.

[6]  Mehdi Kamal,et al.  Impact of Process Variations on Speedup and Maximum Achievable Frequency of Extensible Processors , 2014, JETC.

[7]  Mario R. Casu,et al.  Coupling latency-insensitivity with variable-latency for better than worst case design: a RISC case study , 2011, GLSVLSI '11.

[8]  Li Shen,et al.  Optimal subgraph covering for customisable VLIW processors , 2009, IET Comput. Digit. Tech..

[9]  Scott A. Mahlke,et al.  Scalable subgraph mapping for acyclic computation accelerators , 2006, CASES '06.

[10]  David M. Brooks,et al.  Mitigating the Impact of Process Variations on Processor Register Files and Execution Units , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  Paolo Ienne,et al.  Exact and approximate algorithms for the extension of embedded processor instruction sets , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Kaushik Roy,et al.  Trifecta: A Nonspeculative Scheme to Exploit Common, Data-Dependent Subcritical Paths , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Mehdi Baradaran Tahoori,et al.  Instruction-set extension under process variation and aging effects , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Koen Bertels,et al.  The Instruction-Set Extension Problem: A Survey , 2008, TRETS.

[15]  Hai Zhou,et al.  Fast Estimation of Timing Yield Bounds for Process Variations , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Yuan Xie,et al.  Statistical High-Level Synthesis under Process Variability , 2009, IEEE Design & Test of Computers.

[17]  Vishwani D. Agrawal,et al.  Essentials of electronic testing for digital, memory, and mixed-signal VLSI circuits [Book Review] , 2000, IEEE Circuits and Devices Magazine.

[18]  Sani R. Nassif,et al.  Design for Manufacturability and Statistical Design: A Comprehensive Approach , 2006 .

[19]  Mehdi Kamal,et al.  An architecture-level approach for mitigating the impact of process variations on extensible processors , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  Zeshan Chishti,et al.  Shapeshifter: Dynamically changing pipeline width and speed to address process variations , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[21]  Josep Torrellas,et al.  ReCycle:: pipeline adaptation to tolerate process variation , 2007, ISCA '07.

[22]  Nikil D. Dutt,et al.  VISA synthesis: Variation-aware Instruction Set Architecture synthesis , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[23]  David Blaauw,et al.  A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation , 2011, IEEE Journal of Solid-State Circuits.

[24]  Gu-Yeon Wei,et al.  ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency , 2008, 2008 International Symposium on Computer Architecture.

[25]  Paolo Bonzini,et al.  Recurrence-Aware Instruction Set Selection for Extensible Embedded Processors , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Sani R. Nassif,et al.  Design for Manufacturability and Statistical Design - A Constructive Approach , 2007, Series on integrated circuits and systems.

[27]  Scott A. Mahlke,et al.  Automated custom instruction generation for domain-specific processor acceleration , 2005, IEEE Transactions on Computers.

[28]  David Blaauw,et al.  Process variation in near-threshold wide SIMD architectures , 2012, DAC Design Automation Conference 2012.

[29]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[30]  Mehdi Kamal,et al.  Timing variation-aware custom instruction extension technique , 2011, 2011 Design, Automation & Test in Europe.

[31]  Shingo Watanabe,et al.  Uncriticality-directed scheduling for tackling variation and power challenges , 2009, 2009 10th International Symposium on Quality Electronic Design.

[32]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[33]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[34]  Thambipillai Srikanthan,et al.  Selecting Profitable Custom Instructions for Area–Time-Efficient Realization on Reconfigurable Architectures , 2009, IEEE Transactions on Industrial Electronics.