On the impacts of pel decimation and High-Vt/Low-Vdd on SAD calculation

As the number of pixels per frame tends to increase in new high definition video coding standards such as HEVC, pel decimation appears as a viable means of increasing the energy efficiency of Sum of Absolute Differences (SAD) calculation. This paper presents a VLSI architecture that can be configured to compute the SAD of 4×4 pixel blocks with no subsampling or with 2:1 or 4:1 subsampling (pel decimation). The proposed architecture was synthesized for 130nm, 90nm, 65nm and 45nm standard cell libraries assuming both nominal and Low-Vdd/High-Vt (LH) cases for maximum and a given target throughput. The impacts of subsampling and Low-Vdd/High-Vt on delay, power and energy efficiency are analyzed. In a total of 16 syntheses, the 45nm/LH configurable SAD architecture achieved the highest energy efficiency for target frequency when operating in pel decimation 4:1, spending only 2.19pJ for each 4×4 block, which corresponds to about 20.64 times less energy than the 130nm/nominal configurable architecture operating in full SAD mode. Aside the improvements achieved by using LH, pel decimation solely was responsible for energy reductions of 40% and 60% when 2:1 and 4:1 subsamplings are chosen, respectively, in the configurable architecture.

[1]  Bede Liu,et al.  New fast algorithms for the estimation of block motion vectors , 1993, IEEE Trans. Circuits Syst. Video Technol..

[2]  Robert C. Aitken,et al.  Low Power Methodology Manual - for System-on-Chip Design , 2007 .

[3]  Anastasis A. Sofokleous,et al.  Review: H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia , 2005, Comput. J..

[4]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[5]  Sergio Bampi,et al.  Synthesis and comparison of low-power high-throughput architectures for SAD calculation , 2011, 2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS).

[6]  Timo Hämäläinen,et al.  A High-Performance Sum of Absolute Difference Implementation for Motion Estimation , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  K. Kushida,et al.  A 65nm 1Mb SRAM Macro with Dynamic Voltage Scaling in Dual Power Supply Scheme for Low Power SoCs , 2008, 2008 Joint Non-Volatile Semiconductor Memory Workshop and International Conference on Memory Technology and Design.

[8]  Sergio Bampi,et al.  High Throughput Hardware Architecture for Motion Estimation with 4: 1 Pel Subsampling Targeting Digital Television Applications , 2007, PSIVT.

[9]  Wang Qin,et al.  A High-Performance Low Cost SAD Architecture for Video Coding , 2007, IEEE Transactions on Consumer Electronics.

[10]  M. Bohr,et al.  A fully synchronized, pipelined, and re-configurable 50 Mb SRAM on 90 nm CMOS technology for logic applications , 2003, 2003 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.03CH37408).

[11]  Antti Hallapuro,et al.  Low complexity video coding and the emerging HEVC standard , 2010, 28th Picture Coding Symposium.

[12]  Liang-Gee Chen,et al.  An efficient and low power architecture design for motion estimation using global elimination algorithm , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Yang Song,et al.  32-Parallel SAD Tree Hardwired Engine for Variable Block Size Motion Estimation in HDTV1080P Real-Time Encoding Application , 2007, 2007 IEEE Workshop on Signal Processing Systems.

[14]  Peter M. Kuhn Fast MPEG-4 Motion Estimation: Processor Based and Flexible VLSI Implementations , 1999, J. VLSI Signal Process..

[15]  Iain E. G. Richardson,et al.  H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia , 2003 .

[16]  Liang-Gee Chen,et al.  Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Chein-Wei Jen,et al.  QME: an efficient subsampling-based block matching algorithm for motion estimation , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[18]  Sergio Bampi,et al.  Synthesis and Comparison of Low-Power Architectures for SAD Calculation , 2011 .

[19]  José Luís Almada Güntzel,et al.  Quality assessment of subsampling patterns for pel decimation targeting high definition video , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[20]  Kevin G. Stawiasz,et al.  A 512kb 8T SRAM Macro Operating Down to 0.57 V With an AC-Coupled Sense Amplifier and Embedded Data-Retention-Voltage Sensor in 45 nm SOI CMOS , 2011, IEEE Journal of Solid-State Circuits.

[21]  Hae-Kwan Jung,et al.  A VLSI architecture for the alternative subsampling-based block matching algorithm , 1995 .