Parallelizing MPEG Decoder with Scalable Streaming Computation Kernels

In this paper, we describe a scalable and portable parallelized implementation of a MPEG decoder using a streaming computation paradigm, tailored to new generations of multi—core systems. A novel, hybrid approach towards parallelization of both new and legacy applications is described, where only data—intensive and performance—critical parts are implemented in the streaming domain. An architecture—independent StreamIt language is used for design, optimization and implementation of parallelized segments, while the developed StreamGate interface provides a communication mechanism between the implementation domains. The proposed hybrid approach was employed in re—factoring of a reference MPEG video decoder implementation; identifying the most performance—critical segments and re—implementing them in StreamIt language, with StreamGate interface as a communication mechanism between the host and streaming kernel. We evaluated the scalability of the decoder with respect to the number of cores, video frame formats, sizes and decomposition. Decoder performance was examined in the presence of different processor load configurations and with respect to the number of simultaneously processed frames.

[1]  Michael D. McCool,et al.  Shader algebra , 2004, ACM Trans. Graph..

[2]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  William Thies,et al.  Language and compiler support for stream programs , 2009 .

[4]  Michael I. Gordon,et al.  Language and Compiler Design for Streaming Applications , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[6]  Wesley De Neve,et al.  Parallel Deblocking Filtering in MPEG-4 AVC/H.264 on Massively Parallel Architectures , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Praveen K. Murthy,et al.  System Canvas: a new design environment for embedded DSP and telecommunication systems , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).

[8]  David Zhang,et al.  A lightweight streaming layer for multicore execution , 2008, CARN.

[9]  Koen De Bosschere,et al.  A profile-based tool for finding pipeline parallelism in sequential programs , 2010, Parallel Comput..

[10]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[11]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Ben H. H. Juurlink,et al.  Parallel Scalability of Video Decoders , 2009, J. Signal Process. Syst..

[13]  Vassilios A. Chouliaras,et al.  Thread-parallel MPEG-2, MPEG-4 and H.264 video encoders for SoC multi-processor architectures , 2006, IEEE Transactions on Consumer Electronics.

[14]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[15]  Zhaohui Du,et al.  Data and computation transformations for Brook streaming applications on multiprocessors , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[16]  Gérard Berry,et al.  The Esterel Synchronous Programming Language: Design, Semantics, Implementation , 1992, Sci. Comput. Program..

[17]  M. McCool Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform , 2006 .

[18]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[19]  Josip Knezović,et al.  Integrating Streaming Computations for Efficient Execution on Novel Multicore Architectures , 2010 .

[20]  Krisztián Flautner,et al.  SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip , 2008, CASES '08.

[21]  Angelos Bilas,et al.  Real-time parallel MPEG-2 decoding in software , 1997, Proceedings 11th International Parallel Processing Symposium.

[22]  Edward A. Lee,et al.  Taming heterogeneity - the Ptolemy approach , 2003, Proc. IEEE.

[23]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[24]  David B. Skillicorn,et al.  Models and languages for parallel computation , 1998, CSUR.

[25]  Kiho Choi,et al.  Leveraging Parallel Computing in Modern Video Coding Standards , 2012, IEEE MultiMedia.

[26]  Robert Grimm,et al.  Dynamic expressivity with static optimization for streaming languages , 2013, DEBS '13.

[27]  Konstantinos Konstantinides,et al.  Image and video compression standards , 1995 .

[28]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[29]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[30]  Albert Benveniste,et al.  Signal-A data flow-oriented language for signal processing , 1986, IEEE Trans. Acoust. Speech Signal Process..

[31]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[32]  Kunle Olukotun,et al.  Exploiting Coarse-Grain Parallelism in the MPEG-2 Algorithm , 1998 .

[33]  Valérie Bertin,et al.  Modelling, analysis and parallel implementation of an on-line video encoder , 2005, First International Conference on Distributed Frameworks for Multimedia Applications.

[34]  Marc Pouzet,et al.  A type system for the automatic distribution of higher-order synchronous dataflow programs , 2008, LCTES '08.

[35]  Pascal Raymond,et al.  The synchronous data flow programming language LUSTRE , 1991, Proc. IEEE.

[36]  Henry Hoffmann,et al.  MPEG-2 decoding in a stream programming language , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[37]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[38]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[39]  William Thies,et al.  Teleport messaging for distributed stream programs , 2005, PPoPP.