Custom-instruction synthesis for extensible-processor platforms

Efficiency and flexibility are critical, but often conflicting, design goals in embedded system design. The recent emergence of extensible processors promises a favorable tradeoff between efficiency and flexibility, while keeping design turnaround times short. Current extensible processor design flows automate several tedious tasks, but typically require designers to manually select the parts of the program that are to be implemented as custom instructions. In this work, we describe an automatic methodology to select custom instructions to augment an extensible processor, in order to maximize its efficiency for a given application program. We demonstrate that the number of custom instruction candidates grows rapidly with program size, leading to a large design space, and that the quality (speedup) of custom instructions varies significantly across this space, motivating the need for the proposed flow. Our methodology features cost functions to guide the custom instruction selection process, as well as static and dynamic pruning techniques to eliminate inferior parts of the design space from consideration. Furthermore, we employ a two-stage process, wherein a limited number of promising instruction candidates are first short-listed using efficient selection criteria, and then evaluated in more detail through cycle-accurate instruction set simulation and synthesis of the corresponding hardware, to identify the custom instruction combinations that result in the highest program speedup or maximize speedup under a given area constraint. We have evaluated the proposed techniques using a state-of-the-art extensible processor platform, in the context of a commercial design flow. Experiments with several benchmark programs indicate that custom processors synthesized using automatic custom instruction selection can result in large improvements in performance (up to 5.4/spl times/, an average of 3.4/spl times/), energy (up to 4.5/spl times/, an average of 3.2/spl times/), and energy-delay products (up to 24.2/spl times/, an average of 12.6/spl times/), while speeding up the design process significantly.

[1]  Hugo De Man,et al.  Instruction set definition and instruction selection for ASIPs , 1994, Proceedings of 7th International Symposium on High-Level Synthesis.

[2]  K. Kucukcakar An ASIP design methodology for embedded systems , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).

[3]  Harvey F. Silverman,et al.  Processor reconfiguration through instruction-set metamorphosis , 1993, Computer.

[4]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[5]  Ing-Jer Huang,et al.  Synthesis of Instruction Sets for Pipelined Microprocessors , 1994, 31st Design Automation Conference.

[6]  Clifford Liem,et al.  Compilation Techniques and Tools for Embedded Processor Architectures , 1997 .

[7]  T. C. May,et al.  Instruction-set matching and selection for DSP and ASIP code generation , 1994, Proceedings of European Design and Test Conference EDAC-ETC-EUROASIC.

[8]  Hiroto Yasuura,et al.  A power reduction technique with object code merging for application specific embedded processors , 2000, DATE '00.

[9]  Masaharu Imai,et al.  An ASIP instruction set optimization algorithm with functional module sharing constraint , 1993, ICCAD.

[10]  Ing-Jer Huang,et al.  Generating instruction sets and microarchitectures from applications , 1994, ICCAD.

[11]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[12]  John Paul Shen,et al.  Architecture synthesis of high-performance application-specific processors , 1991, DAC '90.

[13]  Hoon Choi,et al.  Synthesis of application specific instructions for embedded DSP software , 1998, ICCAD '98.

[14]  H. Meyr,et al.  Power reduction for ASIPS: a case study , 2001, 2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578).

[15]  Miodrag Potkonjak,et al.  Designing power efficient hypermedia processors , 1999, ISLPED '99.

[16]  Rainer Leupers,et al.  Instruction set extraction from programmable structures , 1994, EURO-DAC '94.

[17]  Giovanni De Micheli,et al.  Readings in hardware / software co-design , 2001 .

[18]  Henk Corporaal,et al.  Automatic detection of recurring operation patterns , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).

[19]  Joseph A. Fisher Customized instruction-sets for embedded processors , 1999, DAC '99.

[20]  Olivier Sentieys,et al.  Multi-algorithm ASIP synthesis and power estimation for DSP applications , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[21]  Yoshinori Takeuchi,et al.  Effectiveness of the ASIP design system PEAS-III in design of pipelined processors , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[22]  A. Alomary,et al.  An ASIP instruction set optimization algorithm with functional module sharing constraint , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[23]  Kayhan Küçükçakar An ASIP design methodology for embedded systems , 1999, CODES.

[24]  Chong-Min Kyung,et al.  Exploiting intellectual properties in ASIP designs for embedded DSP software , 1999, DAC '99.

[25]  Jan M. Rabaey,et al.  Ultra-low-power domain-specific multimedia processors , 1996, VLSI Signal Processing, IX.

[26]  Donald E. Thomas,et al.  Synthesis of Pipelined Instruction Set Processors , 1993, 30th ACM/IEEE Design Automation Conference.

[27]  Vittorio Zaccaria,et al.  Exploiting data forwarding to reduce the power budget of VLIW embedded processors , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[28]  Klaus Buchenrieder,et al.  Mapping statechart models onto an FPGA-based ASIP architecture , 1996, Proceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition.

[29]  Donald E. Thomas,et al.  Subsetting Behavioral Intellectual Property for Low Power ASIP Design , 1999, J. VLSI Signal Process..

[30]  Donald E. Thomas,et al.  The design of mixed hardware/software systems , 1996, DAC '96.

[31]  B. Ramakrishna Rau,et al.  Automatic architectural synthesis of VLIW and EPIC processors , 1999, Proceedings 12th International Symposium on System Synthesis.

[32]  Albert Wang,et al.  Hardware/software instruction set configurability for system-on-chip processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[33]  Brian W. Kernighan,et al.  The Practice of Programming , 1999 .

[34]  Christian Veith,et al.  PSCP: A scalable parallel ASIP architecture for reactive systems , 1998, Proceedings Design, Automation and Test in Europe.

[35]  Bruce K. Holmer A tool for processor instruction set design , 1994, EURO-DAC '94.

[36]  Masaharu Imai,et al.  PEAS-I: A Hardware/Software Codesign System for ASIP Development , 1994 .

[37]  Miodrag Potkonjak,et al.  Synthesis of application specific programmable processors , 1997, DAC.

[38]  Wei Zhao,et al.  An evolution programming approach on multiple behaviors for the design of application specific programmable processors , 1996, Proceedings ED&TC European Design and Test Conference.

[39]  Mary Jane Irwin,et al.  An extended addressing mode for low power , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[40]  Margarida F. Jacome,et al.  Algorithms for compiler-assisted design space exploration of clustered vliw asip datapaths , 2001 .

[41]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[42]  Michael Gschwind,et al.  Instruction set selection for ASIP design , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).