decoding has been extensively studied in the past, it continues to gain importance as a key workload underlying many present and emerging applications. Additionally, the emerging video coding standard MPEG-4 Part 10, also known as H.264, has some new features that impact the whole system performance. In this paper, we address the characterization of MPEG as well as H.264 decoding on current state-of-the-art superscalar and simultaneous multithreaded (SMT) micro-architectures, discussing both application-level behavior and the key kernels in the applications, e.g., variable-length decoding, IDCT, deblocking filter, and motion compensation. We also address the effectiveness of a number of current micro-architectural enhancements for speeding up this workload. I. INTRODUCTION As the computing power available to users increases and rich content becomes prevalent, multimedia workloads assume growing importance during processor design and overall system performance assessment. One workload of particular importance is MPEG decoding, which is encountered not only as the basis of standalone applications such as DVD or HDTV playback, but also as a key underlying component in even more demanding applications such as interactive video, video editing, and so forth [20]. To date, computational power has typically increased over time through the evolution from simple pipelined designs to the complex speculation and out-of-order execution of many of today's deeply pipelined superscalar designs. However, while single-threaded processors are now much faster than they used to be, the rapidly growing complexity of such designs also makes achieving significant new gains ever more difficult. This work will first describe the workload characterization of MPEG decoding on current superscalar architectures, and then characterize the same workload on simultaneous multi-threading (SMT) architectures [17]. Specially, we use Intel® processors with Hyper-Threading Technology [14], which is one implementation of the SMT architecture. The MPEG and H.264 decoders we use as benchmarks are heavily optimized using the latest ISA extensions [20][21]. For our performance analysis, we use a commercial software analysis tool and the performance counters available on today's processors [1][3][7]. The paper is organized as follows. In Section II, we provide a brief review of the basic principles behind most current video codecs, describing particularly the well-established MPEG-2 standard and the rapidly emerging MPEG-4 part 10 (also known as H.264) standard. Section III provides an
[1]
Thomas Sikora,et al.
MPEG digital video-coding standards
,
1997,
IEEE Signal Process. Mag..
[2]
David J. Lilja,et al.
Data prefetch mechanisms
,
2000,
CSUR.
[3]
Iso/iec 14496-2 Information Technology — Coding of Audio-visual Objects — Part 2: Visual
,
2022
.
[4]
Marc Atkins,et al.
PC Software Performance Tuning
,
1996,
Computer.
[5]
David J. Sager,et al.
The microarchitecture of the Pentium 4 processor
,
2001
.
[6]
Rohan Coelho,et al.
DirectX, RDX, RSX, and MMX Technology: A Jumpstart Guide to High Performance APIs
,
1998
.
[7]
Yen-Kuang Chen,et al.
Implementation of H.264 decoder on general-purpose processors with media instructions
,
2003,
IS&T/SPIE Electronic Imaging.
[8]
Arun N. Netravali,et al.
Digital Video: An introduction to MPEG-2
,
1996
.
[9]
Dean M. Tullsen,et al.
Simultaneous multithreading: Maximizing on-chip parallelism
,
1995,
Proceedings 22nd Annual International Symposium on Computer Architecture.