Code transformation strategies for extensible embedded processors

Embedded application requirements, including high performance, low power consumption and fast time to market, are uncommon in the broader domain of general purpose applications. In order to satisfy these demands, chip manufacturers often provide developers with the possibility to define application-specific Instruction Set Extensions (ISEs). Many techniques have been proposed that automatically identify the most beneficial ISEs from source code, so that compilers can identify the 'best' instruction set for the underlying machine. However, can we simply retrofit these techniques into a traditional compiler, or does ISE identification demand different tuning of the heuristics utilized throughout the optimization pipeline? In this paper, we show why compilers should sometimes make different decisions when targeting customized processors, and we show how traditional ISE identification techniques can improve significantly if the code is properly transformed in order to expose more beneficial extensions. The proposed approach was validated using the SimpleScalar simulator for the ARM processor, augmented with the possibility to define additional instructions.Using benchmarks taken from the MiBench suite,we show that the proposed transformations improve state of the art ISE identi cation techniques by 55% on average and 4x maximum.

[1]  Scott A. Mahlke,et al.  Automated custom instruction generation for domain-specific processor acceleration , 2005, IEEE Transactions on Computers.

[2]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[3]  Nikil D. Dutt,et al.  Automatic Identification of Application-Specific Functional Units with Architecturally Visible Storage , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[4]  Paolo Ienne,et al.  Exact and approximate algorithms for the extension of embedded processor instruction sets , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[6]  Scott A. Mahlke,et al.  An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[7]  Keith D. Cooper,et al.  ACME: adaptive compilation made efficient , 2005, LCTES '05.

[8]  John Wawrzynek,et al.  Instruction-Level Parallelism for Reconfigurable Computing , 1998, FPL.

[9]  Tulika Mitra,et al.  Scalable custom instructions identification for instruction-set extensible processors , 2004, CASES '04.

[10]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[11]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[12]  Gary S. Tyson,et al.  In search of near-optimal optimization phase orderings , 2006 .

[13]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[14]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .