Performance of image and video processing with general-purpose processors and media ISA extensions

This paper aims to provide a quantitative understanding of the performance of image and video processing applications on general-purpose processors, without and with media ISA extensions. We use detailed simulation of 12 benchmarks to study the effectiveness of current architectural features and identify future challenges for these workloads.Our results show that conventional techniques in current processors to enhance instruction-level parallelism (ILP) provide a factor of 2.3X to 4.2X performance improvement. The Sun VIS media ISA extensions provide an additional 1.1X to 4.2X performance improvement. The ILP features and media ISA extensions significantly reduce the CPU component of execution time, making 5 of the image processing benchmarks memory-bound.The memory behavior of our benchmarks is characterized by large working sets and streaming data accesses. Increasing the cache size has no impact on 8 of the benchmarks. The remaining benchmarks require relatively large cache sizes (dependent on the display sizes) to exploit data reuse, but derive less than 1.2X performance benefits with the larger caches. Software prefetching provides 1.4X to 2.5X performance improvement in the image processing benchmarks where memory is a significant problem. With the addition of software prefetching, all our benchmarks revert to being compute-bound.

[1]  David A. Carlson,et al.  Multimedia extensions for a 550-MHz RISC microprocessor , 1997 .

[2]  Todd C. Mowry,et al.  Tolerating latency through software-controlled data prefetching , 1994 .

[3]  Dileep Bhandarkar,et al.  Performance characterization of the Pentium Pro processor , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[4]  Sarita V. Adve,et al.  RSIM: a simulator for shared-memory multiprocessor and uniprocessor systems that exploit ILP , 1997, WCAE-3 '97.

[5]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[6]  Ramesh Radhakrishnan,et al.  Evaluating MMX technology using DSP and multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Marc Tremblay,et al.  The visual instruction set (VIS) in UltraSPARC , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[8]  Sarita V. Adve,et al.  Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.

[9]  Marc Tremblay,et al.  VIS speeds new media processing , 1996, IEEE Micro.

[10]  Michael J. Flynn,et al.  An automated method for software controlled cache prefetching , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[11]  Michael D. Smith,et al.  Geust Editorial: Media processing: a new design target , 1996, IEEE Micro.

[12]  Pradeep K. Dubey,et al.  How Multimedia Workloads Will Change Processor Design , 1997, Computer.

[13]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[14]  Vijay S. Pai,et al.  The Interaction Of Software Prefetching With Ilp Processors In Shared-memory Systems , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[15]  Sarita V. Adve,et al.  The impact of instruction-level parallelism on multiprocessor performance and simulation methodology , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[16]  Don Rice,et al.  High-performance image processing using special-purpose cpu instructions: the ultrasparc visual inst , 1996 .

[17]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18]  Chia-Lin Yang,et al.  Exploiting instruction level parallelism in geometry processing for three dimensional graphics applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[19]  Daniel Frederick Zucker,et al.  Architecture and arithmetic for multimedia-enhanced processors , 1998 .

[20]  Angelos Bilas,et al.  Real-time parallel MPEG-2 decoding in software , 1997, Proceedings 11th International Parallel Processing Symposium.

[21]  Ruby B. Lee,et al.  Challenges to Combining General-Purpose and Multimedia Processors , 1997, Computer.