A Hardware Task Scheduler for Embedded Video Processing

Modern embedded Systems-on-a-Chip deploy multiple programmable cores to meet increasing performance requirements of video, graphics, and modem applications. However, software implementations of task scheduling and inter-task synchronization often limit performance improvements of multicores. Remarkably, several demanding video applications (e.g. H.264 video decoding) rely on task dependency graphs that can be constructed from a simple dependency pattern. Based on such a pattern, our novel hardware task scheduler can quickly create, order, synchronize and map tasks to cores. We found that our hardware task scheduler speeds up a Quad HD H.264 video decoding by 1.17 times compared to a chip multi-processor with a state-of-the-art hardware task queues. Moreover, our hardware task scheduler allows decreasing the number of cores needed to meet the real-time performance requirements for the H.264 decoder and, consequently, reduces the silicon area of the multicore by up to 12.5%.

[1]  Gerard de Haan,et al.  True-motion estimation with 3-D recursive search block matching , 1993, IEEE Trans. Circuits Syst. Video Technol..

[2]  Dominic Sweetman,et al.  See MIPS run , 1999 .

[3]  A. Crespo,et al.  A hardware scheduler for complex real-time systems , 1999, ISIE '99. Proceedings of the IEEE International Symposium on Industrial Electronics (Cat. No.99TH8465).

[4]  E.F.A. Deprettere,et al.  Compiling nested loop programs to process networks , 2000 .

[5]  Sharad Malik,et al.  Retargetable Very Long Instuction Word Compiler Framework for Digital Signal Processors. , 2002 .

[6]  Mahmut T. Kandemir,et al.  Compilation for Distributed Memory Architectures , 2002, The Compiler Design Handbook.

[7]  Priti Shankar,et al.  The Compiler Design Handbook: Optimizations and Machine Code Generation , 2002, The Compiler Design Handbook.

[8]  Erik B. van der Tol,et al.  Mapping of H.264 decoding on a multiprocessor architecture , 2003, IS&T/SPIE Electronic Imaging.

[9]  Stamatis Vassiliadis,et al.  The TM3270 media-processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[10]  S. Vassiliadis,et al.  Motion estimation and temporal up-conversion on the TM3270 media-processor , 2006, 2006 Digest of Technical Papers International Conference on Consumer Electronics.

[11]  Yu-Kwong Kwok,et al.  Practical design of a computation and energy efficient hardware task scheduler in embedded reconfigurable computing systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[12]  Peter Pirsch,et al.  A Highly Parallel Sub-Pel Accurate Motion Estimator for H.264 , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[13]  Uday Bondhugula,et al.  Effective automatic parallelization of stencil computations , 2007, PLDI '07.

[14]  Effective automatic parallelization of stencil computations , 2007, PLDI.

[15]  R. Lakshmish,et al.  Efficient Implementation of VC-1 Decoder on Texas Instrument's OMAP2420 - IVA , 2007, 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services.

[16]  Christopher J. Hughes,et al.  Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.

[17]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[18]  Magnus Själander,et al.  A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.

[19]  Trevor Mudge,et al.  Realizing Software Defined Radio - A Study in Designing Mobile Supercomputers , 2008 .

[20]  Mateo Valero,et al.  Scalability of Macroblock-level Parallelism for H.264 Decoding , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[21]  Andrei Sergeevich Terechko,et al.  A Multithreaded Multicore System for Embedded Media Processing , 2011, Trans. High Perform. Embed. Archit. Compil..