Merge Mode Estimation for a Hardware-Based HEVC Encoder

High Efficiency Video Coding (HEVC) is a video coding standard that offers higher performance than previous video coding standards such as H.264/AVC. Merge mode is one of the new tools adopted in HEVC to improve the inter-frame coding efficiency. Merge mode saves the bits for the motion vector (MV) by sharing the MV with neighboring blocks. Merge mode estimation (MME) is the process of finding a merge mode candidate, which requires extensive computations and memory accesses due to the associated motion compensation. Although MME is very similar to motion estimation (ME) in many ways, previous research on ME cannot be directly applied to solve many difficulties in designing MME hardware. In this paper, the characteristics of and the computational complexity involved in MME are discussed. To improve the throughput of the MME hardware, partially increased parallelism is efficiently exploited. Furthermore, the M-of-N-pixel combination and flexible memory access schemes are proposed to maximize the scalability to support various block sizes of HEVC and to reduce the time for fetching reference data. The proposed schemes are applied to the MME hardware design in this paper. The proposed hardware can process 56074 of 64 × 64 coding tree units per second with a clock frequency of 366 MHz, and its gate count is 585.4k with 2 kB of dual-port static RAM.

[1]  David S. Taubman,et al.  On the benefits of leaf merging in quad-tree motion models , 2005, IEEE International Conference on Image Processing 2005.

[2]  Hyuk-Jae Lee,et al.  Fast merge mode decision for diamond search in High Efficiency Video Coding , 2013, 2013 Visual Communications and Image Processing (VCIP).

[3]  Liang-Gee Chen,et al.  Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Liang-Gee Chen,et al.  Low Power Cache Algorithm and Architecture Design for Fast Motion Estimation in H.264/AVC Encoder System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Satoshi Goto,et al.  A 995Mpixels/s 0.2nJ/pixel fractional motion estimation architecture in HEVC for Ultra-HD , 2013, 2013 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[6]  Reji Mathew,et al.  Quad-Tree Motion Modeling With Leaf Merging , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Liang-Gee Chen,et al.  Cache-based integer motion/disparity estimation for quad-HD H.264/AVC and HD multiview video coding , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Hyuk-Jae Lee,et al.  A survey of fast mode decision algorithms for inter-prediction and their applications to high efficiency video coding , 2012, IEEE Transactions on Consumer Electronics.

[9]  Liang-Gee Chen,et al.  Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Chong-Min Kyung,et al.  A low cost single-pass fractional motion estimation architecture using bit clipping for H.264 video codec , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[11]  Anantha Chandrakasan,et al.  Cost and Coding Efficient Motion Estimation Design Considerations for High Efficiency Video Coding (HEVC) Standard , 2013, IEEE Journal of Selected Topics in Signal Processing.

[12]  Minh N. Do,et al.  Rate-distortion optimized tree-structured compression algorithms for piecewise polynomial images , 2005, IEEE Transactions on Image Processing.

[13]  Hyuk-Jae Lee,et al.  A Real-Time H.264/AVC Encoder With Complexity-Aware Time Allocation , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Hyuk-Jae Lee,et al.  A cache-aware motion estimation organization for a hardware-based H.264 encoder , 2014, IEEE Transactions on Consumer Electronics.

[15]  Gwo-Long Li,et al.  Clock cycle oriented data bandwidth aware merge mode motion vector selection algorithm for HEVC , 2012, 2012 IEEE/SICE International Symposium on System Integration (SII).

[16]  Satoshi Goto,et al.  High performance VLSI architecture of fractional motion estimation in H.264 for HDTV , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[17]  Jae Hun Lee,et al.  Variable block size motion estimation algorithm and its hardware architecture for H.264/AVC , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[18]  Chein-Wei Jen,et al.  On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture , 2002, IEEE Trans. Circuits Syst. Video Technol..

[19]  Detlev Marpe,et al.  Block Merging for Quadtree-Based Partitioning in HEVC , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Yibo Fan,et al.  A full-mode FME VLSI architecture based on 8×8/4×4 adaptive Hadamard Transform for QFHD H.264/AVC encoder , 2011, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip.

[21]  Zhenyu Liu,et al.  An efficient interpolation filter VLSI architecture for HEVC standard , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Keiichi Chono,et al.  Low-complexity merge candidate decision for fast HEVC encoding , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[23]  Wen Gao,et al.  Reusable Architecture and Complexity-Controllable Algorithm for the Integer/Fractional Motion Estimation of H.264 , 2007, IEEE Transactions on Consumer Electronics.

[24]  Liang-Gee Chen,et al.  Level C+ data reuse scheme for motion estimation with corresponding coding orders , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Hyuk-Jae Lee,et al.  A Novel Algorithm for Zero Block Detection in High Efficiency Video Coding , 2013, IEEE Journal of Selected Topics in Signal Processing.

[26]  Tian-Sheuan Chang,et al.  A Hardware-Efficient H.264/AVC Motion-Estimation Design for High-Definition Video , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[27]  Sergio Bampi,et al.  High-throughput interpolation hardware architecture with coarse-grained reconfigurable datapaths for HEVC , 2013, 2013 IEEE International Conference on Image Processing.

[28]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .