Implementation of sum of absolute difference using optimized partial summation term reduction

Video has vast application domains like medicine, security and surveillance. Video coding can be implemented in both hardware and software technologies in which both quality and controllability are required. Process of determining motion vectors known as motion estimation (ME) is a fundamental step in the extraction of activity in videos. The Block Matching Algorithm (BMA) based motion estimation is the most popular method in which the distance measurement between objects in consecutive frame can be computed based on matching criterions called cost function which computes the distortion between the blocks. The massive computations associated with block matching prevent software implementations from running in real-time and lead towards hardware implementation. Due to simplicity in hardware implementation, SAD is preferred cost function in block matching. Several architectures using Sum of Absolute difference (SAD) are developed to improve the hardware efficiency and computational speed of block-matching algorithms. This paper gives comparative analysis of area and speed of operation for Sequential, Pipeline and Parallel architectures for SAD implementation. Parallel architecture provides best throughput at the cost of highest resource utilization. For performing absolute difference (AD) and summation, adder and carry propagate mechanisms are required. In this paper optimized architecture for accumulation of computed AD in Parallel architecture is presented using partial summation term reduction technique which reduces adders by 40% and improve speed of operation around 12% to 43% for various FPGA families.

[1]  Ahmed Ben Atitallah,et al.  HW/SW FPGA Architecture for a Flexible Motion Estimation , 2007, 2007 14th IEEE International Conference on Electronics, Circuits and Systems.

[2]  Gary J. Sullivan,et al.  Rate-constrained coder control and comparison of video coding standards , 2003, IEEE Trans. Circuits Syst. Video Technol..

[3]  M. El-Sharkawy,et al.  High Speed Search Algorithms for Block-Based Motion Estimation Video Compression , 2006, 2006 International Conference on Computer Engineering and Systems.

[4]  Chein-Wei Jen,et al.  On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture , 2002, IEEE Trans. Circuits Syst. Video Technol..

[5]  F. Ghozzi,et al.  Hardware implementation of block matching algorithm with FPGA technology , 2004, Proceedings. The 16th International Conference on Microelectronics, 2004. ICM 2004..

[6]  Avishek Saha,et al.  Speed-area optimized FPGA implementation for Full Search Block Matching , 2007, 2007 25th International Conference on Computer Design.

[7]  Juan M. Meneses,et al.  VLSI architecture for motion estimation using the block-matching algorithm , 1996, Proceedings ED&TC European Design and Test Conference.

[8]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[9]  Mohamed El-Sharkawy,et al.  Hardware Implementation of Block-based Motion Estimation for Real Time Applications , 2007, J. VLSI Signal Process..

[10]  Avishek Saha,et al.  A Speed-Area Optimization of Full Search Block Matching Hardware with Applications in High-Definition TVs (HDTV) , 2007, HiPC.

[11]  K. Priyadarshini,et al.  MultiFrame Fast Search Motion Estimation and VLSI Architecture , 2012 .