Low complexity hardware oriented H.264/AVC motion estimation algorithm and related low power and low cost architecture design

The ever increasing bit-rate on network applications such as broadcasting digital television makes storage capacity larger than ever before. Especially, the advent of Super Hi-Vision (SHV) which has feature of high resolution further intensifies the tough situation. Since limitation exists in network bandwidth and disk storage, the video compression technique is becoming more important than before. As the latest video coding standard, H.264/AVC can provide superior performance to previous standards. However, it also consists of huge complexity. When ASIC (Application Specific Integration Circuits) based real-time hardware system is considered, the intensive complexity in H.264/AVC will cause problems in hardware cost and power consumption. Therefore, to solve the problem, this dissertation focuses on two key issues which are low complexity hardware oriented algorithm and its related architecture. In H.264/AVC based system, motion estimation (ME) which is the major part of inter prediction is the most significant component. It consists of integer ME (IME) and fractional ME (FME) and occupies almost 90% computation, which makes it a must to divide IME and FME into two separate stages in real-time hardwired encoder. Besides motion estimation part, hardware engine of intra prediction is another time consuming part because of its abundant prediction modes. Moreover, the rate distortion based mode decision part which makes a final judgment of inter and intra modes also consumes lot of computation in the final stage of whole encoding system. Many software based fast algorithms have already been proposed to release complexity of H.264/AVC based system. However, most of these algorithms can not be efficiently realized in hardware because of constraints in hardware design. In hardware, factors such as predictable data flow, regular access of memory and full hardware utilization are important to the whole system’s performance. Without considering these factors, hardware cost, throughput and power consumption will increase greatly. So, hardware oriented low complexity algorithm and related low cost and low power hardware architecture are important issues to H.264/AVC based real-time encoder design. Based on analysis of existing works and current problem, this dissertation mainly targets on low cost and low power H.264/AVC real-time hardwired encoder. In detail, it focuses on IME, FME, intra and mode decision, which are four computation intensive parts in H.264/AVC based system. Firstly, low complexity algorithm which follows hardware data flow is proposed. Secondly, based on proposed algorithm, flexible and highly parallel architectures are given out. Moreover, architecture and circuit optimizations are proposed to further reduce the hardware cost and power consumption. The whole dissertation consists of 6 chapters as follows. In the first chapter, introduction in video compression field is given out. The development and feature of video coding standards and emphasis of this dissertation are described in detail. In the second chapter, hardware oriented low complexity motion estimation algorithms are given out. The complexity reduction is achieved in MRF, search range and matching pattern of H.264/AVC based system. Firstly, for MRF technique, gradient and block matching information are used for fast MRF algorithms. The proposed algorithms release the MRF complexity according to macroblock (MB) features in spatial and temporal domains. Secondly, based on the statistical analysis, it is shown that motion feature is conformity across several frames and search range can be adaptive adjusted according to the motion feature of MB. So, two proposals of search range adjustment is given out in this dissertation. For MB with extreme small motion, search range is restricted into 1/8 of original value. For MB with other cases, the search range is adjusted recursively according to the motion feature of MB on previous frame. Thirdly, since pixel difference can reflect spatial feature of current MB, it is used to classify matching pattern of ME process. An pixel difference based adaptive sub-sampling scheme is proposed, which uses three hardware oriented patterns for MB with different spatial features. By combining all the proposed schemes, the overall algorithm can achieve up to 95.72% complexity reduction with average 0.072dB PSNR loss and 0.902% bit-rate increase based on hardware data flow. In the third chapter, two flexible IME architectures for adaptive sub-sampling algorithm, namely adaptive propagate partial SAD (APPSAD) and reconfigurable SAD Tree (RSADT), are proposed. By using configurable SAD, the proposed RSADT architecture achieves data organizations in both architectural and memory level, which speeds up processing time and saving power consumption. For APPSAD, the original processing element (PE) is expanded into four different types. According to different matching patterns, only the related type of PE is enabled and power consumption of other types of PE can be saved. Moreover, circuit optimization is applied on both APPSAD and RSADT are optimized. The propagation chain, original PE and adder trees are simplified, with no redundant registers and adders. So, hardware cost and power consumption are further reduced. With TSMC 0.18um CMOS library, it is shown that the proposed architectures can achieve 61.71% saving of processing cycles and up to 39.8% power reduction of existing works. In the fourth chapter, two low design effort SHV engines for FME and intra prediction are proposed. Firstly, for FME engine, two optimizations in the algorithm level, namely inter mode pre-filtering and one-pass algorithm are proposed. For inter mode pre-filtering, it analyze the motion cost of sub-blocks in IME stage and only focuses on two modes which have smaller cost than others. As for one-pass algorithm, it firstly decides the sliding window based on integer motion cost of neighboring positions. Then, only half and quarter pixel within the sliding window are processed simultaneously, which saves hardware cost and processing time. In the hardware level, with quarter sub-sampling technique in FME stage, a 16-Pel interpolation structure is proposed, which speeds up 4 times of original 4-Pel design while keep almost the same hardware amount. With MB and frame level parallel processing flow, compared with representative design which requires 2.16GHz for 4k×4k@60fps, the proposed FME engine can accomplish real-time processing with only 145MHz. For intra engine, the predictor generation is the most time consuming part. From the analysis of data dependency issue of intra prediction, it is observed that the maximum parallel processing scale is two sub-block instead of original one sub-block way. In this dissertation, one lossless two sub-block parallel data flow are proposed, which saves 37.5% processing time of original one sub-block way. Also, in the original intra predictor generation engine, lots of repetitive computation exists among different modes. In the proposed fully utilized intra predictor generation architecture, no repetitive generation of predictors exists and it is applicable for all intra prediction modes. With proposed architecture, the whole predictor generation process can be finished within only 22.5% cycles of original design. By combining parallel data flow and fully utilized architecture, the proposed intra predictor generation engine is capable of handling 4k×2k@60fps specification. In the fifth chapter, high complexity problem in H.264/AVCmode decision is discussed. By utilizing spatial and temporal information, complexity reduction is achieved in two stages. Firstly, gradients of current MB and motion vector of encoded MB on both current and previous frames are utilized for pre-stage skip mode check. Secondly, during the motion stage, it is observed that information of motion vector predictor (MVP), block overlapping status and rate distortion cost can indicate the accuracy of matching process. In detail, the MVP represents the accuracy of predicted start point. The block overlapping status of different inter modes indicates the motion trend of object. As for rate distortion cost, it is an objective measurement of matching result. Thus, such information is used for early decision of whole encoding process in the proposed mode decision algorithm. Compared with existing works, the proposed algorithm can achieve up to 53.4% speed-up ratio with trivial quality loss. In the sixth chapter, the whole dissertation is concluded and future trend in video compression fields is also briefly discussed. In this dissertation, it focuses on IME, FME, intra and mode decision which are four most important parts in H.264/AVC real-time encoding system. Hardware oriented low complexity algorithm and low cost, low power hardware architectures are proposed. By combining hardware oriented algorithms with proposed architectures, compared with recent 4-stage real-time encoder design, about 90.68% power in IME part can be reduced. As for SHV targeted FME and intra engines, about 93.31% and 67.24% estimated power reduction in hardware design.

[1]  Satoshi Goto,et al.  Edge Block Detection and Motion Vector Information Based Fast VBSME Algorithm , 2008, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[2]  Kai-Kuang Ma,et al.  A new diamond search algorithm for fast block-matching motion estimation , 2000, IEEE Trans. Image Process..

[3]  Zhenyu Liu,et al.  Adaptive Edge Detection Pre-Process Multiple Reference Frames Motion Estimation in H.264/AVC , 2007, 2007 International Conference on Communications, Circuits and Systems.

[4]  Takeshi Ikenaga,et al.  Highly parallel fractional motion estimation engine for Super Hi-Vision 4k×4k@60fps , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[5]  Takeshi Ikenaga,et al.  Compressor tree based processing element optimization in propagate partial SAD architecture , 2008, APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems.

[6]  Satoshi Goto,et al.  Hardware-Oriented Early Detection Algorithms for 4×4 and 8×8 All-Zero Blocks in H.264 , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[7]  Zhenyu Liu,et al.  Hardware friendly background analysis based complexity reduction in H.264/AVC multiple reference frames motion estimation , 2007, 2007 International Symposium on Intelligent Signal Processing and Communication Systems.

[8]  Takeshi Ikenaga,et al.  Macroblock feature and motion involved multi-stage fast inter mode decision algorithm in H.264/AVC video coding , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[9]  T. Cover,et al.  Rate Distortion Theory , 2001 .

[10]  Takeshi Ikenaga,et al.  D-11-21 Macroblock Level Rate Control for H.264/AVC Based on Model Parameter Update and Weighted Reference Calculation , 2009 .

[11]  Chein-Wei Jen,et al.  On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture , 2002, IEEE Trans. Circuits Syst. Video Technol..

[12]  S. Mochizuki,et al.  A low power and high picture quality H.264/MPEG-4 video codec IP for HD mobile applications , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[13]  Takeshi Ikenaga,et al.  Early detection algorithms for 4×4 and 8×8 all-zero blocks in H.264/AVC , 2008, 2008 16th European Signal Processing Conference.

[14]  Takeshi Ikenaga,et al.  D-11-19 Multi-Stage Based Inter Mode Decision Algorithm in H.264/AVC , 2009 .

[15]  Qionghai Dai,et al.  Fast inter prediction mode decision for H.264 , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[16]  John V. McCanny,et al.  A VLSI architecture for variable block size video motion estimation , 2004, IEEE Transactions on Circuits and Systems II: Express Briefs.

[17]  Guifen Tian,et al.  A Fast Hybrid Decision Algorithm for H.264/AVC Intra Prediction Based on Entropy Theory , 2009, MMM.

[18]  Oscal T.-C. Chen,et al.  Motion estimation using an efficient four-step search method , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[19]  Liang-Gee Chen,et al.  Hardware architecture design for variable block size motion estimation in MPEG-4 AVC/JVT/ITU-T H.264 , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[20]  Takeshi Ikenaga,et al.  VLSI oriented fast motion estimation algorithm based on macroblock and motion feature analysis , 2009, 2009 5th International Colloquium on Signal Processing & Its Applications.

[21]  Takeshi Ikenaga,et al.  Macroblock and Motion Feature Analysis to H.264/AVC Fast Inter Mode Decision , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[22]  Yang Song,et al.  Parallel Improved HDTV720p Targeted Propagate Partial SAD Architecture for Variable Block Size Motion Estimation in H.264/AVC , 2008, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[23]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[24]  Liang-Gee Chen,et al.  Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Xuan Jing,et al.  An efficient three-step search algorithm for block motion estimation , 2004, IEEE Transactions on Multimedia.

[26]  Liang-Gee Chen,et al.  Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Alexis M. Tourapis,et al.  Enhanced predictive zonal search for single and multiple frame motion estimation , 2002, IS&T/SPIE Electronic Imaging.

[28]  Satoshi Goto,et al.  A-4-18 Integer Search Position Based Fast Motion Estimation in H.264/AVC , 2008 .

[29]  S. Gary,et al.  Joint Model Reference Encoding Methods and Decoding Concealment Methods , 2003 .

[30]  Yang Song,et al.  Inter search mode reduction based parallel propagate partial SAD architecture for variable block size motion estimation in H.264/AVC (第20回 回路とシステム軽井沢ワークショップ論文集) -- (映像応用) , 2007 .

[31]  Yang Song,et al.  VLSI friendly edge gradient detection based multiple reference frames motion estimation optimization for H.264/AVC , 2007, 2007 15th European Signal Processing Conference.

[32]  J. Paik,et al.  Adaptive mode decision for H.264 encoder , 2004 .

[33]  Luis Salgado,et al.  Sequence Independent very Fast Mode Decision Algorithm on H.264/AVC Baseline Profile , 2006, 2006 International Conference on Image Processing.

[34]  Ming-Chieh Chi,et al.  Efficient multi-frame motion estimation algorithms for MPEG-4 AVC/JVT/H.264 , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[35]  Oscar C. Au,et al.  Highly efficient predictive zonal algorithms for fast block-matching motion estimation , 2002, IEEE Trans. Circuits Syst. Video Technol..

[36]  Takeshi Ikenaga,et al.  D-11-20 Bayesian Decision Based All-Zero Block Detection Algorithm in H.264/AVC , 2009 .

[37]  Zhang Jinyi IP Core Testing Method Based on New MSM Frame , 2005 .

[38]  Satoshi Goto,et al.  VLSI friendly computation reduction scheme in H.264/AVC motion estimation , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[39]  Liang-Gee Chen,et al.  Performance analysis of hardware oriented algorithm modifications in H.264 , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[40]  Satoshi Goto,et al.  VLSI Oriented Fast Motion Estimation Algorithm Based on Pixel Difference, Block Overlapping and Motion Feature Analysis , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[41]  Takeshi Ikenaga,et al.  Fast spatial Direct mode decision for B slice based on temporal information in H.264 standard , 2009, 2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[42]  Tian-Sheuan Chang,et al.  A Fast Algorithm and Its VLSI Architecture for Fractional Motion Estimation for H.264/MPEG-4 AVC Video Coding , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Liang-Gee Chen,et al.  Low power and power aware fractional motion estimation of H.264/AVC for mobile applications , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[44]  Yang Song,et al.  A 1.41W H.264/AVC Real-Time Encoder SOC for HDTV1080P , 2007, 2007 IEEE Symposium on VLSI Circuits.

[45]  Herbert Gish,et al.  Asymptotically efficient quantizing , 1968, IEEE Trans. Inf. Theory.

[46]  Satoshi Goto,et al.  Adaptive Sub-Sampling Based Reconfigurable SAD Tree Architecture for HDTV Application , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[47]  Takeshi Ikenaga,et al.  Parallel enhanced low design effort H.264/AVC fractional motion estimation engine for Super Hi-Vision application , 2009, 2009 IEEE 8th International Conference on ASIC.

[48]  Christos Grecos,et al.  Fast inter mode prediction for P slices in the H264 video coding standard , 2005, IEEE Transactions on Broadcasting.

[49]  Takeshi Ikenaga,et al.  Highly Parallel Fractional Motion Estimation Engine for Super Hi-Vision 4k×4k@60 fps , 2010, IEICE Trans. Electron..

[50]  Lap-Pui Chau,et al.  Fast approach for H.264 inter mode decision , 2004 .

[51]  Yang Song,et al.  HDTV1080p H.264/AVC Encoder Chip Design and Performance Analysis , 2009, IEEE Journal of Solid-State Circuits.

[52]  Liang-Gee Chen,et al.  Analysis and architecture design of variable block-size motion estimation for H.264/AVC , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[53]  Takeshi Ikenaga,et al.  Architecture optimization for H.264/AVC propagate partial SAD engine in HDTV application , 2009, 2009 International SoC Design Conference (ISOCC).

[54]  Zhenyu Liu,et al.  Cost efficient propagate partial SAD architecture for integer motion estimation in H.264/AVC , 2007, 2007 7th International Conference on ASIC.

[55]  Itu-T Video coding for low bitrate communication , 1996 .

[56]  Takeshi Ikenaga,et al.  Early detection algorithms for 8×8 all-zero blocks in H.264/AVC , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[57]  Jo Yew Tham,et al.  A novel unrestricted center-biased diamond search algorithm for block motion estimation , 1998, IEEE Trans. Circuits Syst. Video Technol..

[58]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[59]  Takeshi Ikenaga,et al.  Content aware configurable architecture for H.264/AVC integer motion estimation engine , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[60]  Takeshi Ikenaga,et al.  Bit-Usage Analysis Based Frame Layer QP Adjustment for H.264/AVC Rate Control at Low Bit-Rate , 2009 .

[61]  Liang-Gee Chen,et al.  Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[62]  Takeshi Ikenaga,et al.  On bit allocation and Lagrange Multiplier adjustment for rate-distortion optimized H.264 rate control , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[63]  Liang-Gee Chen,et al.  Frame-parallel design strategy for high definition B-frame H.264/AVC encoder , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[64]  Takeshi Ikenaga,et al.  A resource preserved MAC protocol for QoS provided UWB ad hoc networks , 2008 .

[65]  Jhing-Fa Wang,et al.  A Fast Mode Decision Algorithm and Its VLSI Design for H.264/AVC Intra-Prediction , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[66]  Yang Song,et al.  Hardware-efficient propagate partial sad architecture for variable block size motion estimation in H.264/AVC , 2007, GLSVLSI '07.

[67]  Jill M. Boyce,et al.  Fast mode decision and motion estimation for JVT/H.264 , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[68]  Takeshi Ikenaga,et al.  A Macroblock-Level Rate Control Algorithm for H.264/AVC Video Coding with Context-Adaptive MAD Prediction Model , 2009, 2009 International Conference on Computer Modeling and Simulation.

[69]  Takeshi Ikenaga,et al.  Rate-distortion optimized multi-stage rate control algorithm for H.264/AVC video coding , 2009, 2009 17th European Signal Processing Conference.

[70]  Takeshi Ikenaga,et al.  Macroblock Feature Based Adaptive Propagate Partial SAD Architecture for HDTV Application , 2009, IPSJ Trans. Syst. LSI Des. Methodol..

[71]  Liang-Gee Chen,et al.  A 1.3TOPS H.264/AVC single-chip encoder for HDTV applications , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[72]  Takeshi Ikenaga,et al.  Fully Utilized and Low Design Effort Architecture for H.264/AVC Intra Predictor Generation , 2010, MMM.

[73]  Minho Kim,et al.  A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264 , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[74]  Susanto Rahardja,et al.  Fast mode decision algorithm for intraprediction in H.264/AVC video coding , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[75]  Jiun-In Guo,et al.  A 7mW-to-183mW Dynamic Quality-Scalable H.264 Video Encoder Chip , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[76]  Liang-Gee Chen,et al.  2.8 to 67.2mW Low-Power and Power-Aware H.264 Encoder for Mobile Applications , 2007, 2007 IEEE Symposium on VLSI Circuits.

[77]  Byeungwoo Jeon,et al.  Fast Coding Mode Selection With Rate-Distortion Optimization for MPEG-4 Part-10 AVC/H.264 , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[78]  Thomas Wedi,et al.  Motion- and aliasing-compensated prediction for hybrid video coding , 2003, IEEE Trans. Circuits Syst. Video Technol..

[79]  T. Ikenaga,et al.  Fast motion estimation algorithm based on edge block detection and motion vector information , 2007, 2007 International Symposium on Intelligent Signal Processing and Communication Systems.

[80]  Satoshi Goto,et al.  Adaptive Subsampling and Motion Feature Based Fast H.264 Motion Estimation , 2008, 2008 Congress on Image and Signal Processing.

[81]  Takeshi Ikenaga,et al.  Analysis of adaptive algorithm to power aware design for H.264/AVC integer motion estimation engine in HDTV application , 2009, 2009 IEEE 8th International Conference on ASIC.

[82]  Takeshi Ikenaga,et al.  Fast inter mode decision algorithm based on macroblock and motion feature analysis for H.264/AVC video coding , 2009, 2009 17th European Signal Processing Conference.

[83]  Tian-Sheuan Chang,et al.  Fast block type decision algorithm for intra prediction in H.264 FRext , 2005, IEEE International Conference on Image Processing 2005.

[84]  Liang-Gee Chen,et al.  An H.264/AVC scalable extension and high profile HDTV 1080p encoder chip , 2008, 2008 IEEE Symposium on VLSI Circuits.

[85]  Takeshi Ikenaga,et al.  Macroblock Feature Based Complexity Reduction for H.264/AVC Motion Estimation , 2008, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[86]  Yang Song,et al.  A Fine-Grain Scalable and Low Memory Cost Variable Block Size Motion Estimation Architecture for H.264/AVC , 2006, IEICE Trans. Electron..

[87]  Satoshi Goto,et al.  Aliasing Error Reduction Based Fast VBSME Algorithm , 2008, 2008 Congress on Image and Signal Processing.

[88]  Takeshi Ikenaga,et al.  Spatial feature based reconfigurable H.264/AVC integer motion estimation architecture for HDTV video encoder , 2009, 2009 16th International Conference on Digital Signal Processing.

[89]  Susanto Rahardja,et al.  Fast intermode decision in H.264/AVC video coding , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[90]  Liang-Gee Chen,et al.  Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[91]  Satoshi Goto,et al.  Reconfigurable SAD tree architecture based on adaptive sub-sampling in HDTV application , 2009, GLSVLSI '09.