Fast design space exploration for low-power configurable processors

Customizable and extensible processors (commonly known as “configurable processors” or ASIPs) can provide the flexibility of off-the-shelf processors with a performance closer to that of custom logic. Manual configuration of an ASIP requires highly-specialized knowledge of computer architecture and typically results in suboptimal architectures leading to poor performance and higher costs. Ideally, the ASIP flow should be entirely automated; however, optimal solutions are only guaranteed with an exhaustive search of the design space. Unfortunately, an exhaustive search is computationally prohibitive and so the research community continues to study ways to find “good” solutions within a reasonable time. This dissertation presents new methods of design space exploration and fast architecture evaluation. These methods are intended to improve the automation and usability of ASIPs. Design space exploration is conducted using a novel approach where the design space is modeled using a small sample of points. Each sample point evaluation is expensive; however, the design space model can then be used to quickly estimate all other points in the space. Non-parametric statistics are used to construct the model and, consequently, the precise nature of the design space need not be specified a priori. This approach provides a computationally-efficient alternative to existing optimization heuristics with additional benefits that provide easy discovery of architectural trends and tradeoffs. Experiments were conducted using the proposed modeling approach to configure both the branch prediction unit (BPU) and the cache hierarchy of an embedded processor. Results showed that the approach could achieve a 1 OOx speedup while providing near optimal configurations. In addition, a fast performance estimation approach is proposed for evaluating configurations of instruction-set extensions. This approach considers pipeline effects and consequently improves the quality of results over existing approaches. This improvement is achieved while maintaining constant run-time complexity.

[1]  Resve Saleh,et al.  Analysis and design of digital integrated circuits : in deep submicron technology , 2003 .

[2]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[3]  B. Ramakrishna Rau,et al.  PICO: Automatically Designing Custom Computers , 2002, Computer.

[4]  Subhrajit Bhattacharya,et al.  Keeping hot chips cool , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[5]  Anantha P. Chandrakasan,et al.  Low Power Digital CMOS Design , 1995 .

[6]  Nikil D. Dutt,et al.  Functional abstraction driven design space exploration of heterogeneous programmable architectures , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[7]  Kurt Keutzer,et al.  Building ASIPs: The Mescal Methodology , 2006 .

[8]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[9]  Henk Corporaal,et al.  Designing domain-specific processors , 2001, CODES '01.

[10]  Jason Cong,et al.  Application-specific instruction generation for configurable processor architectures , 2004, FPGA '04.

[11]  Jack W. Davidson,et al.  Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation , 1995, MICRO.

[12]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[13]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[14]  Jun-Cheol Park,et al.  Combining data remapping and voltage/frequency scaling of second level memory for energy reduction in embedded systems , 2003, Microelectron. J..

[15]  Jürgen Teich,et al.  Design space characterization for architecture/compiler co-exploration , 2001, CASES '01.

[16]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[17]  Sharad Malik,et al.  From ASIC to ASIP: the next design discontinuity , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[18]  Alessandro De Gloria,et al.  An evaluation system for application specific architectures , 1990, MICRO 23.

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  Trevor N. Mudge,et al.  Microarchitectural power modeling techniques for deep sub-micron microprocessors , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[21]  Patrick Schaumont,et al.  Cooperative multithreading on embedded multiprocessor architectures enables energy-scalable design , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[22]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[23]  Manfred Schlett Trends in Embedded-Microprocessor Design , 1998, Computer.

[24]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[25]  Heinrich Meyr,et al.  A novel methodology for the design of application-specificinstruction-set processors (ASIPs) using a machine description language , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[26]  Maurice V. Wilkes,et al.  The memory gap and the future of high performance memories , 2001, CARN.

[27]  T.A.C.M. Claasen,et al.  High speed: not the only way to exploit the intrinsic computational power of silicon , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).

[28]  Emil Talpes,et al.  Increased scalability and power efficiency by using multiple speed pipelines , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[29]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[30]  Anshul Kumar,et al.  ASIP design methodologies: survey and issues , 2001, VLSI Design 2001. Fourteenth International Conference on VLSI Design.

[31]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[32]  Margaret Martonosi,et al.  Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[33]  Trevor N. Mudge,et al.  Power: A First-Class Architectural Design Constraint , 2001, Computer.

[34]  Masaharu Imai,et al.  An integrated design environment for application specific integrated processor , 1991, [1991 Proceedings] IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[35]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[36]  Yu-Ting Hung,et al.  Power-Aware Compilation with Architectural Support and Instruction Scheduling , 2004 .

[37]  Dezsö Sima,et al.  The Design Space of Register Renaming Techniques , 2000, IEEE Micro.

[38]  Tse-Yu Yeh,et al.  Understanding branches and designing branch predictors for high-performance microprocessors , 2001, Proc. IEEE.

[39]  Rajeev Barua,et al.  Compiler-directed customization of ASIP cores , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).