Static Timing Analysis Based Transformations of Super-Complex Instruction Set Hardware Functions

Application specific hardware implementations are an increasingly popular way of reducing execution time and power consumption in embedded systems. This application specific hardware typically consumes a small fraction of the execution time and power consumption that the equivalent software code would require. Modern electronic design automation (EDA) tools can be used to apply a variety of transformations to hardware blocks in an effort to achieve additional performance and power savings. A number of such transformations require a tool with knowledge of the designs' timing characteristics. This thesis describes a static timing analyzer and two timing analysis based design automation tools. The static timing analyzer estimates the worst-case timing characteristics of a hardware data flow graph. These hardware data flow graphs are intermediate representations generated within a C to VHDL hardware acceleration compiler. Two EDA tools were then developed which utilize static timing analysis. An automated pipelining tool was developed to increase the throughput of large blocks of combinational logic generated by the hardware acceleration compiler. Another tool was designed in an attempt to mitigate power consumption resulting from extraneous combinational switching. By inserting special signal buffers, known as delay elements, with preselected propagation delays, combinational functional units can be kept inactive until their inputs have stabilized. The hardware descriptions generated by both tools were synthesized, simulated, and power profiled using existing commercial EDA tools. The results show that pipelining leads to an average performance increase of 3.3x, while delay elements saved between 25% and 33% of the power consumption when tested on a set of signal and image processing benchmarks.

[1]  Luciano Lavagno,et al.  Electronic Design Automation for Integrated Circuits Handbook , 2006 .

[2]  S. Gupta,et al.  Power Macromodeling For High Level Power Estimation , 1997, Proceedings of the 34th Design Automation Conference.

[3]  Masanori Hashimoto,et al.  A power optimization method considering glitch reduction by gate sizing , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[4]  Niraj K. Jha,et al.  Behavioral synthesis for low power , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[5]  Sumit Gupta,et al.  SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits , 2004 .

[6]  Luca Benini,et al.  Glitch power minimization by selective gate freezing , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[7]  M. Sarrafzadeh,et al.  Activity-driven clock design for low power circuits , 1995, Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[8]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[9]  Alex K. Jones,et al.  Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware Functions , 2006, EURASIP J. Adv. Signal Process..

[10]  Nikil D. Dutt A Parallelizing Approach to the High-Level Synthesis of Digital Circuits , 2004 .

[11]  Alex K. Jones,et al.  Reducing power while increasing performance with supercisc , 2006, TECS.

[12]  Herman Schmit,et al.  Efficient application representation for HASTE: Hybrid Architectures with a Single, Transformable Executable , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[13]  Marios C. Papaefthymiou,et al.  A Markov chain sequence generator for power macromodeling , 2004, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[15]  Alex K. Jones,et al.  A Low-Energy Reconfigurable Fabric for the SuperCISC Architecture , 2006, J. Low Power Electron..

[16]  Alex K. Jones,et al.  Interconnect Customization for a Coarse-grained Reconfigurable Fabric , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[17]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[18]  Alex K. Jones,et al.  Reducing energy by exploring heterogeneity in a coarse-grain fabric , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[19]  Chandramouli V. Kashyap,et al.  Block-based Static Timing Analysis with Uncertainty , 2003, ICCAD.

[20]  Alex K. Jones,et al.  Extracting speedup from C-code with poor instruction-level parallelism , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[21]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[22]  Wonchan Kim,et al.  A Low Voltage Low Power CMOS Delay Element , 1995, ESSCIRC '95: Twenty-first European Solid-State Circuits Conference.

[23]  Murray B. Woolf Faster Construction Projects with CPM Scheduling , 2007 .

[24]  Herman Schmit,et al.  A Model and Methodology for Hardware-Software Codesign , 1993, IEEE Des. Test Comput..

[25]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.

[26]  Alex K. Jones,et al.  Pipelining Tradeoffs of Massively Parallel SuperCISC Hardware Functions , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[27]  Sharad Malik,et al.  Guarded evaluation: pushing power management to logic synthesis/design , 1995, ISLPED '95.

[28]  Yau-Tsun Steven Li,et al.  Static Timing Analysis Of Embedded Software , 1997, Proceedings of the 34th Design Automation Conference.

[29]  Gary S. Tyson,et al.  Evaluating Design Tradeoffs in Dual Speed Pipelines , 2001 .

[30]  Vivek Tiwari,et al.  Reducing power in high-performance microprocessors , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[31]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[32]  Kaushik Roy,et al.  A power macromodeling technique based on power sensitivity , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[33]  Eby G. Friedman,et al.  A low power thyristor-based CMOS programmable delay element , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[34]  Seth Copen Goldstein,et al.  PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[35]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[36]  Alex K. Jones,et al.  A VLIW Processor With Hardware Functions: Increasing Performance While Reducing Power , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[37]  T. Bridges,et al.  A CPU utilization limit for massively parallel MIMD computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[38]  Alex K. Jones,et al.  An FPGA-based VLIW processor with custom hardware execution , 2005, FPGA '05.