VLSI implementation of 16-point DCT for H.265/HEVC using walsh hadamard transform and lifting scheme

In this paper, a fast 16-point DCT is implemented using a multiplier-less architecture. The 16-point DCT matrix is decomposed into sparse sub-matrices in order to reduce the multiplications and finally the multiplications are completely eliminated using the lifting scheme. Therefore, the computational complexity of the architecture is much lower than the direct implementation of 16-point DCT. In software implementation, 45 dB of PSNR is achieved for the “Lena” image. The VLSI implementation has been carried out for a 90-nm standard cell technology at a clock frequency of 150 MHz.

[1]  Gamal Fahmy,et al.  Efficient fast multiplication-free integer transformation for the 2-D DCT H.265 standard , 2010, 2010 IEEE International Conference on Image Processing.

[2]  A. Dempster,et al.  Use of minimum-adder multiplier blocks in FIR digital filters , 1995 .

[3]  Marta Karczewicz,et al.  Efficient large size transforms for high-performance video coding , 2010, Optical Engineering + Applications.

[4]  Truong Q. Nguyen,et al.  Video compression using integer DCT , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[5]  M. Martina,et al.  Folded multiplierless lifting-based wavelet pipeline , 2007 .

[6]  M. Martina,et al.  dynDCT: a dynamically adaptable integer DCT , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[7]  Gerald Schuller,et al.  Improved integer transforms using multi-dimensional lifting [audio coding examples] , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  J. Manz A sequency-ordered fast Walsh transform , 1972 .

[9]  V. Ralph Algazi,et al.  Unified Matrix Treatment of the Fast Walsh-Hadamard Transform , 1976, IEEE Transactions on Computers.

[10]  S. C. Knauer,et al.  The Karhunen-Loeve, discrete cosine, and related transforms obtained via the Hadamard transform. [for data compression] , 1978 .

[11]  K. R. Rao,et al.  Orthogonal Transforms for Digital Signal Processing , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Nasir Ahmed,et al.  On a Real-Time Walsh-Hadamard/Cosine Transform Image Processor , 1978, IEEE Transactions on Electromagnetic Compatibility.

[13]  Trac D. Tran,et al.  Fast multiplierless approximations of the DCT with the lifting scheme , 2001, IEEE Trans. Signal Process..

[14]  Mohamed El-Hadedy,et al.  Performance and area efficient transpose memory architecture for high throughput adaptive signal processing systems , 2010, 2010 NASA/ESA Conference on Adaptive Hardware and Systems.

[15]  K. Rao,et al.  Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations , 2006 .

[16]  I. Daubechies,et al.  Factoring wavelet transforms into lifting steps , 1998 .

[17]  Uwe Meyer-Baese,et al.  Digital Signal Processing with Field Programmable Gate Arrays , 2001 .