MPEG Decoding Workload Characterization

decoding has been extensively studied in the past, it continues to gain importance as a key workload underlying many present and emerging applications. Additionally, the emerging video coding standard MPEG-4 Part 10, also known as H.264, has some new features that impact the whole system performance. In this paper, we address the characterization of MPEG as well as H.264 decoding on current state-of-the-art superscalar and simultaneous multithreaded (SMT) micro-architectures, discussing both application-level behavior and the key kernels in the applications, e.g., variable-length decoding, IDCT, deblocking filter, and motion compensation. We also address the effectiveness of a number of current micro-architectural enhancements for speeding up this workload. I. INTRODUCTION As the computing power available to users increases and rich content becomes prevalent, multimedia workloads assume growing importance during processor design and overall system performance assessment. One workload of particular importance is MPEG decoding, which is encountered not only as the basis of standalone applications such as DVD or HDTV playback, but also as a key underlying component in even more demanding applications such as interactive video, video editing, and so forth [20]. To date, computational power has typically increased over time through the evolution from simple pipelined designs to the complex speculation and out-of-order execution of many of today's deeply pipelined superscalar designs. However, while single-threaded processors are now much faster than they used to be, the rapidly growing complexity of such designs also makes achieving significant new gains ever more difficult. This work will first describe the workload characterization of MPEG decoding on current superscalar architectures, and then characterize the same workload on simultaneous multi-threading (SMT) architectures [17]. Specially, we use Intel® processors with Hyper-Threading Technology [14], which is one implementation of the SMT architecture. The MPEG and H.264 decoders we use as benchmarks are heavily optimized using the latest ISA extensions [20][21]. For our performance analysis, we use a commercial software analysis tool and the performance counters available on today's processors [1][3][7]. The paper is organized as follows. In Section II, we provide a brief review of the basic principles behind most current video codecs, describing particularly the well-established MPEG-2 standard and the rapidly emerging MPEG-4 part 10 (also known as H.264) standard. Section III provides an