USHA: Unified software and hardware architecture for video decoding

Video decoders used in emerging applications need to be flexible to handle a large variety of video formats and deliver scalable performance to handle wide variations in workloads. In this paper we propose a unified software and hardware architecture for video decoding to achieve scalable performance with flexibility. The light weight processor tiles and the reconfigurable hardware tiles in our architecture enable software and hardware implementations to co-exist, while a programmable interconnect enables dynamic interconnection of the tiles. Our process network oriented compilation flow achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi-processor, semi hardware and full hardware implementations of a video decoder. An application quality of service aware scheduler monitors and controls the operation of the entire system. We prove the concept through a prototype of the architecture on an off-the-shelf FPGA. The FPGA prototype shows a scaling in performance from QCIF to 1080p resolutions in four discrete steps. We also demonstrate that the reconfiguration time is short enough to allow migration from one configuration to the other without any frame loss.

[1]  Jörg Henkel,et al.  H. 264 HDTV Decoder Using Application-Specific Networks-On-Chip , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2]  Mateo Valero,et al.  A Highly Scalable Parallel Implementation of H.264 , 2011, Trans. High Perform. Embed. Archit. Compil..

[3]  Klaus D. Müller-Glaser,et al.  MORPHEUS: Heterogeneous Reconfigurable Computing , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[4]  Chirag Pujara,et al.  H.264 Video Decoder Optimization on ARM Cortex-A8 with NEON , 2009, 2009 Annual IEEE India Conference.

[5]  Yen-Kuang Chen,et al.  Implementation of H.264 encoder and decoder on personal computers , 2006, J. Vis. Commun. Image Represent..

[6]  Gary J. Sullivan,et al.  Recent developments in standardization of high efficiency video coding (HEVC) , 2010, Optical Engineering + Applications.

[7]  Stamatis Vassiliadis,et al.  FLUX Networks: Interconnects on Demand , 2006, 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[8]  SoHayden Kwok-Hay,et al.  A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH , 2008 .

[9]  Frank Vahid,et al.  Transmuting coprocessors: Dynamic loading of FPGA coprocessors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[10]  Bart Pieters,et al.  Performance evaluation of H.264/AVC decoding and visualization using the GPU , 2007, SPIE Optical Engineering + Applications.

[11]  Jiun-In Guo,et al.  A system architecture exploration on the configurable HW/SW co-design for H.264 video decoder , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[12]  S. K. Nandy,et al.  A H.264 decoder: A design style comparison case study , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[13]  Todor Stefanov,et al.  pn: A Tool for Improved Derivation of Process Networks , 2007, EURASIP J. Embed. Syst..

[14]  Peter Held,et al.  Functional design of data flow networks , 1996 .

[15]  Jürgen Becker,et al.  H. 264 Decoder at HD Resolution on a Coarse Grain Dynamically Reconfigurable Architecture , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[17]  Robert W. Brodersen,et al.  A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[18]  Harald Haas,et al.  Asilomar Conference on Signals, Systems, and Computers , 2006 .

[19]  Muhammad Shafique,et al.  KAHRISMA: A Novel Hypermorphic Reconfigurable-Instruction-Set Multi-grained-Array Architecture , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[20]  Tughrul Arslan,et al.  H.264 Decoder Implementation on a Dynamically Reconfigurable Instruction Cell Based Architecture , 2006, 2006 IEEE International SOC Conference.

[21]  Bjorn De Sutter,et al.  Implementation of a Coarse-Grained Reconfigurable Media Processor for AVC Decoder , 2008, J. Signal Process. Syst..

[22]  Rainer Scholz Adapting and Automating XILINX's Partial Reconfiguration Flow for Multiple Module Implementations , 2007, ARC.

[23]  Liang-Gee Chen,et al.  A 59.5mW scalable/multi-view video decoder chip for Quad/3D Full HDTV and video streaming applications , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[24]  S. K. Nandy,et al.  REDEFINE: Runtime reconfigurable polymorphic ASIC , 2009, TECS.

[25]  Heonshik Shin,et al.  Parallelizing the H.264 decoder on the cell BE architecture , 2010, EMSOFT '10.

[26]  S. K. Nandy,et al.  An Input Triggered Polymorphic ASIC for H.264 Decoding , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[27]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[28]  Ivo Bolsens,et al.  Proceedings of the conference on Design, Automation & Test in Europe , 2000 .

[29]  E. Salami,et al.  A performance characterization of high definition digital video decoding using H.264/AVC , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[30]  Wenjie Wang,et al.  H.264 parallel decoder at HD resolution on a coarse-grained reconfigurable multi-media system , 2010, 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology.

[31]  Soo-Ik Chae,et al.  Configurable high-performance video platform using multiple RISC clusters connected with separated data and control networks , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[32]  Ed F. Deprettere,et al.  Systematic and Automated Multiprocessor System Design, Programming, and Implementation , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.