DLP+TLP processors for the next generation of media workloads

Future media workloads will require about two levels of magnitude the performance achieved by current general purpose processors. High uni-threaded performance will be needed to accomplish real-time constraints together with huge computational throughput, as next generation of media workloads will be eminently multithreaded (MPEG-4/MPEG-7). In order to fulfil the challenge of providing both good uni-threaded performance and throughput, we propose to join the simultaneous multithreading execution paradigm (SMT) together with the ability to execute media-oriented streaming /spl mu/-SIMD instructions. This paper evaluates the performance of two different aggressive SMT processors: one with conventional /spl mu/-SIMD extensions (such as MMX) and one with longer streaming vector /spl mu/-SIMD extensions. We will show that future media workloads are, in fact, dominated by the scalar performance. The combination of SMT plus streaming vector /spl mu/-SIMD helps alleviate the performance bottleneck of the integer unit. SMT allows "hiding" vector execution underneath integer execution by overlapping the two types of computation, while the streaming vector /spl mu/-SIMD reduces the pressure on issue width and fetch bandwidth, and provides a powerful mechanism to tolerate latency that allows to implement smart decoupled cache hierarchies.

[1]  Marc Tremblay,et al.  VIS speeds new media processing , 1996, IEEE Micro.

[2]  Johannes Kneip,et al.  Applying and implementing the MPEG-4 multimedia standard , 1999, IEEE Micro.

[3]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[4]  Pradeep K. Dubey,et al.  How Multimedia Workloads Will Change Processor Design , 1997, Computer.

[5]  Masaaki Oka,et al.  Vector Unit Architecture for Emotion Synthesis , 2000, IEEE Micro.

[6]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[8]  André Seznec,et al.  Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[9]  Keith Diefendorff,et al.  Power4 focuses on memory bandwidth , 1999 .

[10]  Theo Ungerer,et al.  MPEG-2 video decompression on simultaneous multithreaded multimedia processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[11]  Ruby B. Lee,et al.  Challenges to Combining General-Purpose and Multimedia Processors , 1997, Computer.

[12]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[13]  Mateo Valero,et al.  Exploiting instruction- and data-level parallelism , 1997, IEEE Micro.

[14]  Mateo Valero,et al.  Exploiting a new level of DLP in multimedia applications , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[15]  Joel S. Emer,et al.  Simultaneous multithreading: multiplying alpha performance , 1999 .

[16]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  R. Koenen,et al.  MPEG-4 multimedia for our time , 1999 .

[18]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[19]  Mateo Valero,et al.  Adding a vector unit to a superscalar processor , 1999, ICS '99.

[20]  Corinna G. Lee,et al.  Simple vector microprocessors for multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[21]  Christoforos E. Kozyrakis,et al.  A New Direction for Computer Architecture Research , 1998, Computer.

[22]  John Wawrzynek,et al.  Vector microprocessors , 1998 .

[23]  Mauricio J. Serrano,et al.  Performance estimation of multistreamed, superscalar processors , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.