Hardware/compiler codevelopment for an embedded media processor

Embedded and portable systems running multimedia applications create a new challenge for hardware architects. A microprocessor for such applications needs to be easy to program like a general-purpose processor and have the performance and power efficiency of a digital signal processor. This paper presents the codevelopment of the instruction set, the hardware, and the compiler for the Vector IRAM media processor. A vector architecture is used to exploit the data parallelism of multimedia programs, which allows the use of highly modular hardware and enables implementations that combine high performance, low power consumption, and reduced design complexity. It also leads to a compiler model that is efficient both in terms of performance and executable code size. The memory system for the vector processor is implemented using embedded DRAM technology, which provides high bandwidth in an integrated, cost-effective manner. The hardware and the compiler for this architecture make complementary contributions to the efficiency of the overall system. This paper explores the interactions and tradeoffs between them, as well as the enhancements to a vector architecture necessary for multimedia processing. We also describe how the architecture, design, and compiler features come together in a prototype system-on-a-chip, able to execute 3.2 billion operations per second per watt.

[1]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[2]  Ted G. Lewis Information Appliances: Gadget Netopia , 1998, Computer.

[3]  Pradeep K. Dubey,et al.  How Multimedia Workloads Will Change Processor Design , 1997, Computer.

[4]  Stylianos Perissakis,et al.  The Energy Efficiency Of Iram Architectures , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[5]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[6]  Mateo Valero,et al.  Decoupled vector architectures , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[7]  Joseph A. Fisher,et al.  Very Long Instruction Word architectures and the ELI-512 , 1983, ISCA '83.

[8]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[9]  Kunle Olukotun,et al.  The hierarchical multi-bank DRAM: a high-performance architecture for memory integrated with processors , 1997, Proceedings Seventeenth Conference on Advanced Research in VLSI.

[10]  Subramanian S. Iyer,et al.  Embedded DRAM technology: opportunities and challenges , 1999 .

[11]  Bob Francis,et al.  Silicon Graphics Inc. , 1993 .

[12]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[13]  Christoforos E. Kozyrakis,et al.  Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler , 2000, Intelligent Memory Systems.

[14]  James E. Smith,et al.  Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  John Wawrzynek,et al.  Vector microprocessors , 1998 .

[16]  Anantha P. Chandrakasan,et al.  Design techniques for portable systems , 1993, 1993 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[17]  Anantha P. Chandrakasan,et al.  Low-power CMOS digital design , 1992 .

[18]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[19]  James E. Smith,et al.  The microarchitecture of superscalar processors , 1995, Proc. IEEE.

[20]  Randi Thomas An Architectural Performance Study of the Fast Fourier Transform on Vector IRAM , 2000 .

[21]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[22]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[23]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[24]  Samuel Williams,et al.  VIRAM1: A MediaOriented Vector Processor with Embedded DRAM , 2000 .

[25]  James S. Kolodzey The CRAY-1 com computer system , 2000 .

[26]  Gurindar S. Sohi High-Bandwidth Interleaved Memories for Vector Processors-A Simulation Study , 1993, IEEE Trans. Computers.

[27]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[28]  Kemal Ebcioglu,et al.  Compilers for Instruction-Level Parallelism , 1997, Computer.

[29]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[30]  Corinna G. Lee,et al.  Simple vector microprocessors for multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[31]  Norman P. Jouppi,et al.  Quantifying the Complexity of Superscalar Processors , 2002 .