An Efficient Application Processor Architecture for Multicore Software Video Decoding

In this paper, we propose a new multicore application processor architecture that facilitates the adoption of the fine-granularity software-pipeline parallelism without causing an extra burden on the system bus. The proposed system-on-a-chip architecture can simultaneously support the traditional symmetric multiprocessor (SMP) and the proposed software-pipeline applications efficiently. The programming model of the proposed architecture is compatible with the existing SMP operating systems. For the implementation of the pipeline-based parallelism, new programmer-friendly system calls are suggested to take advantage of the new software-pipeline datapath. The proposed architecture with four reduced instruction set computing cores is implemented on an field-programmable gate array development board for verification. An Advanced Video Coding/H.264 baseline profile video decoder that explores the pipeline parallelism with dynamic pipeline-stage partitioning is implemented on the target platform to justify the benefits of the proposed architecture. Experimental results show that the adoption of the proposed pipeline datapath architecture into existing application processors enables new potentials in exploring software parallelism.

[1]  K. Alagarsamy,et al.  A mutual exclusion algorithm with optimally bounded bypasses , 2005, Inf. Process. Lett..

[2]  Anand Raghunathan,et al.  Automatic generation of software pipelines for heterogeneous parallel systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Ben H. H. Juurlink,et al.  Parallel Scalability of Video Decoders , 2009, J. Signal Process. Syst..

[4]  Gary L. Peterson,et al.  Myths About the Mutual Exclusion Problem , 1981, Inf. Process. Lett..

[5]  Karam S. Chatha,et al.  Dynamic scheduling of stream programs on embedded multi-core processors , 2012, CODES+ISSS '12.

[6]  Ki-Seok Chung,et al.  Stage-based frame-partitioned parallelization of H.264/AVC decoding , 2010, IEEE Transactions on Consumer Electronics.

[7]  Chia-Lin Yang,et al.  A Multi-core Architecture Based Parallel Framework for H.264/AVC Deblocking Filters , 2009, J. Signal Process. Syst..

[8]  Chun-Jen Tsai,et al.  Dynamic task partition for video decoding on heterogeneous dual-core platforms , 2013, TECS.

[9]  Ben H. H. Juurlink,et al.  Parallel video decoding in the emerging HEVC standard , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Lothar Thiele,et al.  Scenario-based design flow for mapping streaming applications onto on-chip many-core systems , 2012, CASES '12.

[11]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[12]  Liang-Gee Chen,et al.  Hardware architecture design of an H.264/AVC video codec , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[13]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Yongdong Zhang,et al.  Efficient Parallel Framework for H.264/AVC Deblocking Filter on Many-Core Platform , 2012, IEEE Transactions on Multimedia.

[15]  Yuan-Hua Chu,et al.  Overview of ITRI PAC project - from VLIW DSP processor to multicore computing platform , 2008, 2008 IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[16]  Ashraf A. Kassim,et al.  A pipelined hardware implementation of in-loop deblocking filter in H.264/AVC , 2006, IEEE Transactions on Consumer Electronics.

[17]  Anantha Chandrakasan,et al.  Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Jörg Henkel,et al.  Optimizations for configuring and mapping software pipelines in many core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[19]  Jian Wang,et al.  Software pipelining of nested loops for real-time DSP applications , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[20]  Bo Wang,et al.  Parallel Task Developing Based on Software Pipeline in Multicore System , 2010, International Symposium on Parallel and Distributed Processing with Applications.

[21]  Ben H. H. Juurlink,et al.  A QHD-capable parallel H.264 decoder , 2011, ICS '11.

[22]  M. Kthiri,et al.  A parallel hardware architecture of deblocking filter in H264/AVC , 2010, 2010 9th International Symposium on Electronics and Telecommunications.

[23]  Yong Ho Song,et al.  Efficient coordination of parallel threads of H.264/AVC decoder for performance improvement , 2010, IEEE Transactions on Consumer Electronics.

[24]  Theo Ungerer,et al.  Multithreaded Processors , 2002, Comput. J..

[25]  Eduardo Juárez Martínez,et al.  An H.264 video decoder based on a latest generation DSP , 2009, IEEE Transactions on Consumer Electronics.

[26]  Chanho Lee,et al.  Design of an H.264 decoder with variable pipeline and smart bus arbiter , 2010, 2010 International SoC Design Conference.

[27]  Edward S. Davidson,et al.  Evaluating the Use of Register Queues in Software Pipelined Loops , 2001, IEEE Trans. Computers.