Exact and approximate algorithms for the extension of embedded processor instruction sets

In embedded computing, cost, power, and performance constraints call for the design of specialized processors, rather than for the use of the existing off-the-shelf solutions. While the design of these application-specific CPUs could be tackled from scratch, a cheaper and more effective option is that of extending the existing processors and toolchains. Extensibility is indeed a feature now offered in real designs, e.g., by processors such as Tensilica Xtensa [T. R. Halfhill, Microprocess Rep., 2003], ARC ARCtangent [T. R. Halfhill, Microprocess Rep., 2000], STMicroelectronics ST200 [P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood, Proc. 27th Annu. Int. Symp. Computer Architecture, 2000, p. 203], and MIPS CorExtend [T. R. Halfhill, Microprocess Rep., 2003]. While all these processors provide development environments with simulation capabilities for evaluating efficiently hand-crafted solutions, the tools to identify automatically the best processor configuration for a given application are less common. In particular, solutions to choose specialized instruction-set extensions (ISEs) have been investigated in the past years but are still seldom part of commercial toolchains. This paper provides a formal methodology and a set of algorithms that help address the problem. It proposes exact algorithms to derive optimal ISEs; exact identification of a single ISE is applicable to basic blocks of up to 1500 assembler-like instructions. This paper also introduces approximate methods that can process basic blocks of larger size. Results show that the described algorithms find solutions close to those that a designer would obtain by a detailed study of the application code. Both heuristic and exact algorithms find ISEs able to speed up unextended processors up to 5.0x. State-of-the-art comparisons show that the presented algorithms outperform existing ones by up to 2.6x

[1]  Masaharu Imai,et al.  An integer programming approach to instruction implementation method selection problem , 1992, Proceedings EURO-DAC '92: European Design Automation Conference.

[2]  Srivaths Ravi,et al.  Custom-instruction synthesis for extensible-processor platforms , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Hugo De Man,et al.  Instruction set definition and instruction selection for ASIPs , 1994, Proceedings of 7th International Symposium on High-Level Synthesis.

[4]  Cesare Alippi,et al.  A DAG-Based Design Approach for Reconfigurable VLIW Processors , 1999, DATE.

[5]  Scott Mahlke,et al.  Processor acceleration through automated instruction set customization , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[6]  Jan Hoogerbrugge,et al.  ConCISe: a compiler-driven CPLD-based instruction set accelerator , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[7]  Michael D. Smith,et al.  A high-performance microarchitecture with hardware-programmable functional units , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  Majid Sarrafzadeh,et al.  Instruction generation and regularity extraction for reconfigurable processors , 2002, CASES '02.

[9]  Prithviraj Banerjee,et al.  A C compiler for a processor with a reconfigurable functional unit , 2000, FPGA '00.

[10]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[11]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[12]  Scott A. Mahlke,et al.  Processor Acceleration Through Automated Instruction Set Customization , 2003, MICRO.

[13]  Geoffrey Brown,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, ISCA '00.

[14]  Ing-Jer Huang,et al.  Synthesis of application specific instruction sets , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15]  Tulika Mitra,et al.  Scalable custom instructions identification for instruction-set extensible processors , 2004, CASES '04.

[16]  Henk Corporaal,et al.  Designing domain-specific processors , 2001, CODES '01.

[17]  Bruce K. Holmer Automatic Design of Computer Instruction Sets , 1993 .

[18]  Kalyanmoy Deb,et al.  Messy Genetic Algorithms: Motivation, Analysis, and First Results , 1989, Complex Syst..

[19]  John J. Grefenstette,et al.  Genetic Algorithms for Changing Environments , 1992, PPSN.

[20]  Nikil D. Dutt,et al.  Introduction of local memory elements in instruction set extensions , 2004, Proceedings. 41st Design Automation Conference, 2004..

[21]  Jason Cong,et al.  Application-specific instruction generation for configurable processor architectures , 2004, FPGA '04.

[22]  Andreas Moshovos,et al.  CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, ISCA '00.

[23]  S. Cadambi,et al.  CPR: a configuration profiling tool , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[24]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[25]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[26]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[27]  Srivaths Ravi,et al.  A Scalable Application-Specific Processor Synthesis Methodology , 2003, ICCAD 2003.

[28]  Hoon Choi,et al.  Synthesis of application specific instructions for embedded DSP software , 1998, ICCAD '98.

[29]  Srivaths Ravi,et al.  A Scalable Synthesis Methodology for Application-Specific Processors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[30]  Vipin Kumar,et al.  Hmetis: a hypergraph partitioning package , 1998 .

[31]  Robert K. Brayton,et al.  HW/SW partitioning and code generation of embedded control applications on a reconfigurable architecture platform , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[32]  Paolo Ienne,et al.  Automatic application-specific instruction-set extensions under microarchitectural constraints , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).