The TM3270 media-processor

We present the TM3270 media-processor, the latest TriMedia VLIW processor, tuned to address the performance demands of standard definition video processing, combined with embedded processor requirements for the consumer market. We discuss the architecture, implementation, and its first realization in a 90 nm process technology. The processor incorporates instruction set architectural (ISA) extensions and a load/store unit optimized for the video-processing domain. The ISA extensions improve the performance on video processing kernels. The data cache policies and prefetching techniques allow for efficient access to multimedia data. Finally, power consumption and performance data are presented

[1]  Ali Saidi,et al.  The Reconfigurable Streaming Vector Processor (RSVP , 2003 .

[2]  Iain E. G. Richardson,et al.  H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia , 2003 .

[3]  Mateo Valero,et al.  Exploiting a new level of DLP in multimedia applications , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[4]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[5]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[6]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[7]  Stamatis Vassiliadis,et al.  Interlock Collapsing ALU's , 1993, IEEE Trans. Computers.

[8]  Frans Sijstermans The TriMedia processor: the price-performance challenge for media processing , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[9]  Wen-Hsiung Chen,et al.  A Fast Computational Algorithm for the Discrete Cosine Transform , 1977, IEEE Trans. Commun..

[10]  J. Labrousse,et al.  A 500 MHz microprocessor with a very long instruction word architecture , 1990, 1990 37th IEEE International Conference on Solid-State Circuits.

[11]  Lizy K. John,et al.  A Decoupled Architecture for Accelerating Multimedia Applications , 2001, PACT 2001.

[12]  Henry P. Moreton,et al.  The GeForce 6800 , 2005, IEEE Micro.

[13]  Santanu Dutta,et al.  Architecture and design of a Talisman-compatible multimedia processor , 1999, IEEE Trans. Circuits Syst. Video Technol..

[14]  Bernd Girod,et al.  A content-dependent fast DCT for low bit-rate video coding , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[15]  Ramesh Radhakrishnan,et al.  Evaluating MMX technology using DSP and multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[16]  Gert Slavenburg,et al.  An architectural overview of the programmable multimedia processor, TM-1 , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[17]  Stamatis Vassiliadis,et al.  A load-instruction unit for pipelined processors , 1993, IBM J. Res. Dev..

[18]  Chris Basoglu,et al.  The Equator MAP-CA/spl trade/ DSP: an end-to-end broadband signal processor/spl trade/ VLIW , 2002, IEEE Trans. Circuits Syst. Video Technol..

[19]  Edward S. Davidson,et al.  Highly concurrent scalar processing , 1986, ISCA 1986.

[20]  Joseph A. Fisher,et al.  Very Long Instruction Word architectures and the ELI-512 , 1983, ISCA '83.

[21]  Heiko Schwarz,et al.  Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[22]  Gerrit A. Slavenburg,et al.  CREATE-LIFE: a modular design approach for high performance ASICs , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[23]  N. Seshan High VelociTI processing [Texas Instruments VLIW DSP architecture] , 1998 .

[24]  Bede Liu,et al.  New fast algorithms for the estimation of block motion vectors , 1993, IEEE Trans. Circuits Syst. Video Technol..

[25]  Jin-Hau Kuo,et al.  A low-cost media-processor based real-time MPEG-4 video decoder , 2002, 2002 Digest of Technical Papers. International Conference on Consumer Electronics (IEEE Cat. No.02CH37300).

[26]  Amit Shoham TI Aims for Floating-Point DSP Lead DSP Giant Ups the Ante on VLIW With Powerful ’ C 6701 , 1999 .

[27]  Gerard de Haan,et al.  True-motion estimation with 3-D recursive search block matching , 1993, IEEE Trans. Circuits Syst. Video Technol..

[28]  Randall D. Isaac The future of CMOS technology , 2000, IBM J. Res. Dev..

[29]  Henrique S. Malvar,et al.  Low-complexity transform and quantization in H.264/AVC , 2003, IEEE Trans. Circuits Syst. Video Technol..

[30]  Janak H. Patel,et al.  Stride directed prefetching in scalar processors , 1992, MICRO 1992.

[31]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[32]  Alex K. Jones,et al.  An FPGA-based VLIW processor with custom hardware execution , 2005, FPGA '05.

[33]  Michael A. Schuette,et al.  The Reconfigurable Streaming Vector Processor (RSVPTM) , 2003, MICRO.

[34]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[35]  Stamatis Vassiliadis,et al.  MPEG macroblock parsing and pel reconstruction on an FPGA-augmented TriMedia processor , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[36]  Stamatis Vassiliadis,et al.  Instruction set architecture enhancements for video processing , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[37]  Stamatis Vassiliadis,et al.  Temporal video up-conversion on a next generation media-processor , 2005, SIP.

[38]  Stamatis Vassiliadis,et al.  Implementation and evaluation of the Complex Streamed Instruction set , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[39]  J·-W·范德瓦尔德特 Memory region based data pre-fetching , 2003 .

[40]  Lex Augusteijn,et al.  Instruction Scheduling for TriMedia , 1999, J. Instr. Level Parallelism.

[41]  J·-W·范德瓦尔德特 SDRAM address mapping optimized for two-dimensional access , 2003 .

[42]  Aravind Dasu,et al.  A survey of media processing approaches , 2002, IEEE Trans. Circuits Syst. Video Technol..

[43]  Frederick P. Brooks,et al.  Computer architecture - concepts and evolution , 1997 .

[44]  S. Vassiliadis,et al.  A Comparison Between Processor Architectures for Multimedia Applications , 2004 .

[45]  Javier Zalamea,et al.  Improved spill code generation for software pipelined loops , 2000, PLDI '00.

[46]  Stamatis Vassiliadis,et al.  The TM3270 media-processor data cache , 2005, 2005 International Conference on Computer Design.

[47]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[48]  J·-W·范德维尔德特,et al.  Using a cache miss pattern to address a stride prediction table , 2003 .

[49]  Andy D. Pimentel,et al.  TriMedia CPU64 architecture , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[50]  게리트 슬라벤부르그 System and method for a fully synthesizable superpipelined vliw processor , 2003 .

[51]  Dominic Sweetman,et al.  See MIPS run , 1999 .

[52]  Gordon E. Moore,et al.  Progress in digital integrated electronics , 1975 .

[53]  S. Vassiliadis,et al.  Motion estimation and temporal up-conversion on the TM3270 media-processor , 2006, 2006 Digest of Technical Papers International Conference on Consumer Electronics.

[54]  Alan Jay Smith,et al.  Measuring the Performance of Multimedia Instruction Sets , 2002, IEEE Trans. Computers.

[55]  Stamatis Vassiliadis,et al.  The MOLEN polymorphic processor , 2004, IEEE Transactions on Computers.

[56]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[57]  Stamatis Vassiliadis,et al.  The CSI multimedia architecture , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[58]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[59]  B. Hamber Publications , 1998, Weed Technology.

[60]  Lizy Kurian John,et al.  Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements , 2003, IEEE Trans. Computers.

[61]  Faouzi Kossentini,et al.  H.264/AVC baseline profile decoder complexity analysis , 2003, IEEE Trans. Circuits Syst. Video Technol..

[62]  Christoforos E. Kozyrakis,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, MICRO.

[63]  Scott A. Mahlke,et al.  Sentinel scheduling for VLIW and superscalar processors , 1992, ASPLOS V.

[64]  Gerard de Haan,et al.  IC for motion compensated de-interlacing, noise reduction, and picture rate conversion , 1999, 1999 Digest of Technical Papers. International Conference on Consumer Electronics (Cat. No.99CH36277).

[65]  Award , 2007, The Veterinary record.

[66]  G. Haan,et al.  Robust motion-compensated video upconversion , 1997 .

[67]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[68]  Philips Components,et al.  WPM 3.3: A 50MHz Microprocessor with a Very Long Instruction Word Architecture , 1990 .

[69]  Gary S. Tyson,et al.  On high-bandwidth data cache design for multi-issue processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[70]  Stamatis Vassiliadis,et al.  Motion estimation performance of the TM3270 processor , 2005, SAC '05.

[71]  Yongmin Kim,et al.  Data Cache and Direct Memory Access in Programming Mediaprocessors , 2001, IEEE Micro.