Overview of research efforts on media ISA extensions and their usage in video coding

This paper summarizes the results of over 25 research groups or individual researchers that have presented video coding implementations on general-purpose processors with the new single instruction multiple data media instruction set architecture extensions. The extensions are introduced and the fundamentals for extensions, as well as some inherent problems, are explained. The reported attempts to utilize the extensions are divided into kernel- and application-level, as well as platform dependent and independent optimizations. Optimized applications include, in addition to some proprietary methods, all of the major video coding standards such as H.261, H.263, MPEG-4, MPEG-1, and MPEG-2. These optimized implementations include a complete video codec, several decoders, and several encoders. Additionally, a performance comparison is given for four representative encoder implementations based on the reported results. Also included is an overview of future trends for new instructions and architectural speed-up techniques.

[1]  Ruby B. Lee Realtime MPEG video via software decompression on a PA-RISC processor , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[2]  Ville Lappalainen,et al.  Performance analysis of Intel MMX technology for an H.263 video H.263 video encoder , 1998, MULTIMEDIA '98.

[3]  David A. Carlson,et al.  Multimedia extensions for a 550-MHz RISC microprocessor , 1997 .

[4]  Ruby B. Lee,et al.  64-bit and multimedia extensions in the PA-RISC 2.0 architecture , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[5]  J. Fridman Sub-word parallelism in digital signal processing , 2000 .

[6]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[7]  T.D. Hamalainen,et al.  Optimization of emerging H.26L video encoder , 2001, 2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578).

[8]  Faouzi Kossentini,et al.  An efficient computation-constrained block-based motion estimation algorithm for low bit rate video coding , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[9]  Corinna G. Lee,et al.  Simple vector microprocessors for multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  Fred Weber,et al.  AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.

[11]  Peter Pirsch,et al.  Instruction Set Extensions for MPEG-4 Video , 1999, J. VLSI Signal Process..

[12]  James E. Smith,et al.  Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Y. Arai,et al.  A Fast DCT-SQ Scheme for Images , 1988 .

[14]  F. Kossentini,et al.  Implementation of a fast H.263+ encoder/decoder , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[15]  Wei Ding,et al.  VIS-based native video processing on UltraSPARC , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[16]  Irek Defée,et al.  Performance of desktop software MPEG-2 TS decoder , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[17]  Ruby B. Lee Efficiency of microSIMD architectures and index-mapped data for media processors , 1998, Electronic Imaging.

[18]  Franco Casalino,et al.  MPEG-4 video decoder optimization , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[19]  Ruby B. Lee,et al.  Fast subword permutation instructions based on butterfly network , 1999, Electronic Imaging.

[20]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[21]  Mateo Valero,et al.  Exploiting a new level of DLP in multimedia applications , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[22]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[23]  Michael A. Greene Pentium® processor with MMX™ technology performance. , 1997 .

[24]  Mateo Valero,et al.  DLP+TLP processors for the next generation of media workloads , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[25]  K. J. Ray Liu,et al.  Software optimization of H.263 video encoder on Pentium processor with MMX technology , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[26]  Marco Ferretti Multi-media extensions in super-pipelined micro-architectures. A new case for SIMD processing? , 2000, Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception.

[27]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[28]  Stefan Eckart High-performance software MPEG video player for PCs , 1995, Electronic Imaging.

[29]  Faouzi Kossentini,et al.  Efficient coding and mapping algorithms for software-only real-time video coding at low bit rates , 2000, IEEE Trans. Circuits Syst. Video Technol..

[30]  Wen-Hsiung Chen,et al.  A Fast Computational Algorithm for the Discrete Cosine Transform , 1977, IEEE Trans. Commun..

[31]  Ville Lappalainen Performance of an advanced video codec on a general-purpose processor with media ISA extensions , 2000, 2000 Digest of Technical Papers. International Conference on Consumer Electronics. Nineteenth in the Series (Cat. No.00CH37102).

[32]  Lowell L. Winger Source adaptive software 2D iDCT with SIMD , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[33]  Marc Tremblay,et al.  The visual instruction set (VIS) in UltraSPARC , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[34]  Masao Ikekawa,et al.  Parallel variable length decoding with inverse quantization for software MPEG-2 decoders , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.

[35]  P. S. Karthikeyan,et al.  More on arbitrary boundary packed arithmetic , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).

[36]  Ruby B. Lee,et al.  Mapping of application software to the multimedia instructions of general-purpose microprocessors , 1997, Electronic Imaging.

[37]  Tatsuji Moriyoshi,et al.  Real-Time Software Video Codec with a Fast Adaptive Motion Vector Search , 2001, J. VLSI Signal Process..

[38]  Ruby B. Lee Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.

[39]  Atul Gupta,et al.  A software-based real-time MPEG-2 video encoder , 2000, IEEE Trans. Circuits Syst. Video Technol..

[40]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[41]  Ishfaq Ahmad,et al.  Optimization of H.263 video encoding using a single processor computer: performance tradeoffs and benchmarking , 2001, IEEE Trans. Circuits Syst. Video Technol..

[42]  Irek Defée Software decoding of HDTV , 1999, 1999 Digest of Technical Papers. International Conference on Consumer Electronics (Cat. No.99CH36277).

[43]  Uri C. Weiser,et al.  Intel MMX for multimedia PCs , 1997, Commun. ACM.

[44]  Masao Ikekawa,et al.  Fast 2D IDCT implementation with multimedia instructions for a software MPEG2 decoder , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[45]  Javier Hormigo,et al.  MMX-like architecture extension to support the rotation operation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[46]  Tanaka,et al.  A Real-time Software MPEG-2 Decoder For Multimedia PCs , 1997, 1997 International Conference on Consumer Electronics.

[47]  Zarka Cvetanovic,et al.  Performance characterization of the Alpha 21164 microprocessor using TP and SPEC workloads , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[48]  Dileep Bhandarkar,et al.  Performance characterization of the Pentium Pro processor , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[49]  Zhongli He,et al.  A high performance fast search algorithm for block matching motion estimation , 1997, IEEE Trans. Circuits Syst. Video Technol..

[50]  Rainer Leupers Code selection for media processors with SIMD instructions , 2000, DATE '00.

[51]  W. Chen,et al.  Native signal processing on the Ultrasparc in the Ptolemy environment , 1996, Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers.

[52]  Ruby B. Lee Multimedia extensions for general-purpose processors , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.

[53]  Shreekant S. Thakkar,et al.  Internet Streaming SIMD Extensions , 1999, Computer.

[54]  Ruby B. Lee,et al.  Algorithmic and architectural enhancements for real-time MPEG-1 decoding on a general purpose RISC workstation , 1995, IEEE Trans. Circuits Syst. Video Technol..

[55]  S. K. Nandy,et al.  Arbitrary precision arithmetic-SIMD style , 1998, Proceedings Eleventh International Conference on VLSI Design.

[56]  Lizy K. John,et al.  Execution characteristics of multimedia applications on a Pentium II processor , 2000, Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086).

[57]  Lizy Kurian John,et al.  Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures , 2000, Proceedings 2000 International Conference on Computer Design.

[58]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[59]  Dongho Kim,et al.  AMD's 3DNow!/sup TM/ vectorization for signal processing applications , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[60]  Stamatis Vassiliadis,et al.  Coarse reconfigurable multimedia unit extension , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.

[61]  G.S. Moschytz,et al.  Practical fast 1-D DCT algorithms with 11 multiplications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[62]  Ja-Ling Wu,et al.  MMX-based DCT and MC algorithms for real-time pure software MPEG decoding , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[63]  Daniel Frederick Zucker,et al.  Architecture and arithmetic for multimedia-enhanced processors , 1998 .

[64]  Marc Tremblay,et al.  VIS speeds new media processing , 1996, IEEE Micro.

[65]  William J. Dally,et al.  Efficient conditional operations for data-parallel architectures , 2000, MICRO 33.

[66]  Francesco Zanichelli,et al.  The long and winding road to high-performance image processing with MMX/SSE , 2000, Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception.

[67]  Ming-Lei Liou,et al.  Software-based video encoding using high-performance computing , 1999 .

[68]  Michael A. Greene Pentium(R) processor with MMX/sup TM/ technology performance , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[69]  Lizy Kurian John,et al.  Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology , 1999, ICS '99.

[70]  Leslie Kohn,et al.  MPEG video decoding with the UltraSPARC visual instruction set , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[71]  Ruby B. Lee Subword permutation instructions for two-dimensional multimedia processing in MicroSIMD architectures , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[72]  I-Chen Wu,et al.  The design and performance analysis for the multimedia function unit of the NSC-98 CPU , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[73]  Ramesh Radhakrishnan,et al.  Evaluating MMX technology using DSP and multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[74]  Stamatis Vassiliadis,et al.  Multimedia enhanced general-purpose processors , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[75]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[76]  Trung A. Diep,et al.  Performance evaluation of the PowerPC 620 microarchitecture , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[77]  Christoforos E. Kozyrakis,et al.  A New Direction for Computer Architecture Research , 1998, Computer.