Custom-fit processors: letting applications define architectures

In this paper we report on a system which automatically designs realistic VLIW architectures highly optimized for one given application (the input for this system), while running all other code correctly. The system uses a product-quality compiler that generates very aggressive VLIW code. We retarget the compiler until we have found a VLIW architecture idealized for the application on the basis of performance, a cost function and a hardware budget. We show that we can automatically select architectures that achieve large speedups on color and image processing codes. Specialization is shown to be very valuable: The differences between architectural choices, even among reasonable-seeming architectures having similar costs, can be very great, often a factor of 5 (and sometimes much more). We show also that specialization is also very dangerous. A reasonable choice of architecture to fit one algorithm can be a very poor choice for another even in the same domain. There is sometimes an architecture, near in cost and performance to the best, that does much better on a second algorithm.

[1]  Ing-Jer Huang,et al.  High level synthesis of pipelined instruction set processors and back-end compilers , 1992, [1992] Proceedings 29th ACM/IEEE Design Automation Conference.

[2]  Edward A. Lee,et al.  A hardware-software codesign methodology for DSP applications , 1993, IEEE Design & Test of Computers.

[3]  P. Faraboschi,et al.  An evaluation system for application specific architectures , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[4]  Giovanni De Micheli,et al.  Computer-aided hardware-software codesign , 1994, IEEE Micro.

[5]  Michael J. Flynn,et al.  ASIC microprocessors , 1989, MICRO 22.

[6]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[7]  Y. Arai,et al.  A Fast DCT-SQ Scheme for Images , 1988 .

[8]  Hugo De Man,et al.  Cathedral-III : architecture-driven high-level synthesis for high throughput DSP applications , 1991, 28th ACM/IEEE Design Automation Conference.

[9]  Nikil D. Dutt,et al.  Partitioned register files for VLIWs: a preliminary analysis of tradeoffs , 1992, MICRO 25.

[10]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[11]  Heinrich Theodor Vierhaus,et al.  A codesign methodology for high performance embedded systems , 1996 .

[12]  Michel Auguin,et al.  Automatic exploration of VLIW processor architectures from a designer's experience based specification , 1994, CODES.

[13]  Sumedh W. Sathaye,et al.  A technique to determine power-efficient, high-performance superscalar processors , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[14]  Thomas M. Conte,et al.  Determining cost-effective multiple issue processor designs , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[15]  Alessandro De Gloria,et al.  An evaluation system for application specific architectures , 1990, MICRO 23.

[16]  J. M. Mulder,et al.  An architecture framework for application-specific and scalable architectures , 1989, ISCA '89.

[17]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.

[18]  Alvin M. Despain,et al.  Viewing instruction set design as an optimization problem , 1991, MICRO 24.