An Evaluation of Different DLP Alternatives for the Embedded Media Domain

The importance of media processing has produced a revolution in the design of embedded processors. In order to face the high computational and technological demands of near future media applications, new embedded processors are including features that were commonly restricted to the general purpose and the supercomputing domains. In this paper we have evaluated the performance of various DLP (Data Level Parallelism) oriented embedded architectures and analyzed quantitative data in order to determine the highlights and disadvantages of each approach. Additionally we have analyzed the differences between the explicit parallel versions of code (often based on the standard algorithms) and the high-tuned, non-vectorizable versions usually found in real multimedia programs. We will show that sub-word SIMD architectures (like MMX) are a very costeffective solution, and that, while long vector architectures provide few improvements at a very high cost, a smart combination between vector and SIMD-like architectures is the alternative that leverages best performance at a reasonable cost. We will also show that the memory latency tolerance, typical of vector architectures, is partially compensated by the worse spatial locality found when executing vector code.

[1]  Gerard O'Regan Texas Instruments , 1964, Nature.

[2]  Mateo Valero,et al.  Adding a vector unit to a superscalar processor , 1999, ICS '99.

[3]  Corinna G. Lee,et al.  Simple vector microprocessors for multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[4]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[5]  Christoforos E. Kozyrakis,et al.  A New Direction for Computer Architecture Research , 1998, Computer.

[6]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[7]  Sony’s Emotionally Charged Chip , 1999 .

[8]  B. Lee A new algorithm to compute the discrete cosine Transform , 1984 .

[9]  Marc Tremblay,et al.  VIS speeds new media processing , 1996, IEEE Micro.

[10]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[11]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Mateo Valero,et al.  Exploiting a new level of DLP in multimedia applications , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[13]  Thomas Sikora,et al.  MPEG digital video-coding standards , 1997, IEEE Signal Process. Mag..

[14]  Manfred Schlett Trends in Embedded-Microprocessor Design , 1998, Computer.

[15]  Corinna G. Lee,et al.  Initial results on the performance and cost of vector microprocessors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.