Application-specific architecture framework for high-performance low -power embedded computing

The design space of embedded systems is enormously large. These embedded applications have strict requirements on power consumption, performance, cost, and time to market. It is extremely critical to fully address these constraints when designing microprocessors for embedded systems. This dissertation proposes Framework-based Instruction-set Tuning Synthesis (FITS). FITS is an architectural and microarchitectural innovation that effectively tackles all the above requirements. FITS reduces power consumption by running applications with half the code size and much improved locality, which allows the use of a smaller instruction cache that achieves higher hit rates while requiring less power to operate. FITS improves the performance through custom-tailored application-specific instruction set architecture (ISA) and ground-breaking micro architectural enhancement. The application-specific instruction set tailoring is achieved by synthesizing ISA to match precisely the requirements of the targeted application. The microarchitecture is enhanced by integrating the revolutionary Versatile Integrated Processing (VIP) unit and a Zero-Overhead Loop Execution (ZOLE) unit into it. The VIP unit is a universal data-crunching engine that delivers superior data computing and data streaming performances. The ZOLE unit streamlines the program control flow by removing expensive loop control overhead from both nested and non-nested loops. Both architectural and microarchitectural innovations are accomplished by replacing the fixed instruction decoder of general-purpose embedded processors with a programmable decoder. Using a programmable decoder decouples the microarchitecture from the ISA so that designers can add new capabilities to the microarchitecture without being restricted by the limited instruction space. A general-purpose, fully-capable microarchitecture reduces the design cost and shortens the time to market by leveraging fabrication advantages of a single-chip solution that can amortize high non-recurring engineering cost and long turnaround design cycle through mass production. Through the use of a programmable decoder, and an enhanced general-purpose microarchitecture equipped with VIP and ZOLE, FITS pioneers a new genre of embedded microprocessors that can achieve application-specific processor performance and low energy consumption, while maintaining the fabrication advantages of a mass-produced single-chip solution that yields low production cost and fast time to market.

[1]  Gary S. Tyson,et al.  FITS: framework-based instruction-set tuning synthesis for embedded application specific processors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[2]  Trevor Mudge,et al.  Challenges for architectural level power modeling , 2002 .

[3]  A. Church An Unsolvable Problem of Elementary Number Theory , 1936 .

[4]  Gary S. Tyson,et al.  An energy efficient instruction set synthesis framework for low power embedded system designs , 2005, IEEE Transactions on Computers.

[5]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[6]  M. V. Wilkes,et al.  Micro-programming and the design of the control circuits in an electronic digital computer , 1953 .

[7]  Michael L. Scott,et al.  Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003, ISCA '03.

[8]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[9]  Chandra Shekhar,et al.  Design of an application specific instruction set processor for parametric speech synthesis , 2004, 17th International Conference on VLSI Design. Proceedings..

[10]  William J. Dally,et al.  Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[11]  Scott A. Mahlke,et al.  An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[12]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[13]  Venkatesh Akella,et al.  Synchroscalar: a multiple clock domain, power-aware, tile-based embedded processor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[14]  Amir Roth,et al.  DISE: a programmable macro engine for customizing applications , 2003, ISCA '03.

[15]  Luca Benini,et al.  Selective instruction compression for memory energy reduction in embedded systems , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[16]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[17]  Yuan Xie,et al.  A code decompression architecture for VLIW processors , 2001, MICRO.

[18]  Gary S. Tyson,et al.  PowerFITS: Reduce Dynamic and Static I-Cache Power Using Application Specific Instruction Set Synthesis , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[19]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[20]  Margaret Martonosi,et al.  Control techniques to eliminate voltage emergencies in high performance processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[21]  David R. Ditzel,et al.  Branch folding in the CRISP microprocessor: reducing branch delay to zero , 1987, ISCA '87.

[22]  David B. Whalley,et al.  Effective exploitation of a zero overhead loop buffer , 1999, LCTES '99.

[23]  Yervant Zorian,et al.  2001 Technology Roadmap for Semiconductors , 2002, Computer.

[24]  Trevor N. Mudge,et al.  Reducing code size with run-time decompression , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[25]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[26]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[27]  Trevor N. Mudge,et al.  Power: A First-Class Architectural Design Constraint , 2001, Computer.

[28]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[29]  Sameh W. Asaad,et al.  Reducing instruction fetch energy with backwards branch control information and buffering , 2003, ISLPED '03.

[30]  Sanjay Jinturkar,et al.  Aggressive Loop Unrolling in a Retargetable Optimizing Compiler , 1996, CC.

[31]  Geoffrey Brown,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, ISCA '00.

[32]  Rajiv Gupta,et al.  Dynamic coalescing for 16-bit instructions , 2005, TECS.

[33]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[34]  Shlomo Weiss,et al.  A study of CodePack: optimizing embedded code space , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[35]  Wen-mei W. Hwu,et al.  Enhancing loop buffering of media and telecommunications applications using low-overhead predication , 2001, MICRO.

[36]  Chris Weaver,et al.  CryptoManiac: a fast flexible architecture for secure communication , 2001, ISCA 2001.

[37]  Smaïl Niar,et al.  Impact of Code Compression on the Power Consumption in Embedded Systems , 2003, Embedded Systems and Applications.

[38]  A. Cozzolino,et al.  Powerpc microprocessor family: the programming environments , 1994 .

[39]  Shorin Kyo,et al.  An integrated memory array processor architecture for embedded image recognition systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[40]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[41]  Pradip Bose,et al.  Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[42]  Scott A. Mahlke,et al.  Processor Acceleration Through Automated Instruction Set Customization , 2003, MICRO.

[43]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[44]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[45]  Scott A. Mahlke,et al.  Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[46]  Michael C. Huang,et al.  A framework for dynamic energy efficiency and temperature management , 2000, MICRO 33.

[47]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[48]  Brad Calder,et al.  Reducing code size with echo instructions , 2003, CASES '03.

[49]  Andreas Moshovos,et al.  CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, ISCA '00.

[50]  Donald J. Patterson,et al.  Computer organization and design: the hardware-software interface (appendix a , 1993 .

[51]  Sarita V. Adve,et al.  Exploiting Structural Duplication for Lifetime Reliability Enhancement , 2005, ISCA 2005.

[52]  Gary S. Tyson,et al.  Improving program efficiency by packing instructions into registers , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[53]  Jörg Henkel,et al.  Code compression for low power embedded system design , 2000, Proceedings 37th Design Automation Conference.