Computation of 2D 8×8 DCT Based on the Loeffler Factorization Using Algebraic Integer Encoding

This paper proposes a computational method for 2D 8×8 DCT based on algebraic integers. The proposed algorithm is based on the Loeffler 1D DCT algorithm, and it is shown to operate with exact computation—i.e., error-free arithmetic—up to the final reconstruction step (FRS). The proposed algebraic integer architecture maintains error-free computations until an entire block of DCT coefficients having size 8×8 is computed, unlike algorithms in the literature which claim to be error-free but in fact introduce arithmetic errors between the column- and row-wise 1D DCT stages in a 2D DCT operation. Fast algorithms are proposed for the final reconstruction step employing two approaches, namely, the expansion factor and dyadic approximation. A digital architecture is also proposed for a particular FRS algorithm, and is implemented on an FPGA platform for on-chip verification. The FPGA implementation operates at 360 MHz, and is capable of a real-time throughput of <inline-formula><tex-math notation="LaTeX"> $3.6\cdot 10^8$</tex-math><alternatives><inline-graphic xlink:href="cintra-ieq4-2837755.gif"/></alternatives> </inline-formula> 2D DCTs of size 8×8 every second, with corresponding pixel rate of <inline-formula> <tex-math notation="LaTeX">$2.3\cdot 10^{10}$</tex-math><alternatives> <inline-graphic xlink:href="cintra-ieq6-2837755.gif"/></alternatives></inline-formula> pixels per second. The digital architecture is synthesized using 180 nm CMOS standard cells and shows a chip area of 7.41 mm<inline-formula> <tex-math notation="LaTeX">$^2$</tex-math><alternatives><inline-graphic xlink:href="cintra-ieq7-2837755.gif"/> </alternatives></inline-formula>. The CMOS design is predicted to operate at 893 MHz clock frequency, at a dynamic power consumption 13.22 mW/MHz <inline-formula><tex-math notation="LaTeX">$\cdot$</tex-math><alternatives> <inline-graphic xlink:href="cintra-ieq8-2837755.gif"/></alternatives></inline-formula> V<inline-formula> <tex-math notation="LaTeX">$_{sup}^2$</tex-math><alternatives><inline-graphic xlink:href="cintra-ieq9-2837755.gif"/> </alternatives></inline-formula>.

[1]  Colin Doutre,et al.  HEVC: The New Gold Standard for Video Compression: How Does HEVC Compare with H.264/AVC? , 2012, IEEE Consumer Electronics Magazine.

[2]  Konstantinos Konstantinides,et al.  Image and video compression standards , 1995 .

[3]  Rahul Pachauri,et al.  IMPROVED NOISE CANCELLATION IN DISCRETE COSINE TRANSFORM DOMAIN USING ADAPTIVE BLOCK LMS FILTER , 2012 .

[4]  O. Gustafsson,et al.  Towards optimal multiple constant multiplication: A hypergraph approach , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[5]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[6]  Nuno Roma,et al.  Efficient Hybrid DCT-Domain Algorithm for Video Spatial Downscaling , 2007, EURASIP J. Adv. Signal Process..

[7]  Vladimir Britanak,et al.  CHAPTER 1 – Discrete Cosine and Sine Transforms , 2006 .

[8]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[9]  Bruno Zatt,et al.  An HEVC multi-size DCT hardware with constant throughput and supporting heterogeneous CUs , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[10]  Graham A. Jullien,et al.  Error-free computation of 8/spl times/8 2D DCT and IDCT using two-dimensional algebraic integer quantization , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[11]  Graham A. Jullien,et al.  On the Error-Free Realization of a Scaled DCT Algorithm and Its VLSI Implementation , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[12]  O. Gustafsson,et al.  A novel approach to multiple constant multiplication using minimum spanning trees , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[13]  Majid Ahmadi,et al.  A low-power DCT IP core based on 2D algebraic integer encoding , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[14]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[15]  Michael T. Heideman Multiplicative complexity, convolution, and the DFT , 1988 .

[16]  Ying Wang,et al.  Toward a Better Understanding of DCT Coefficients in Watermarking , 2008, 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[17]  Arjuna Madanayake,et al.  Error-free computation of 8-point discrete cosine transform based on the Loeffler factorisation and algebraic integers , 2016, IET Signal Process..

[18]  G.S. Moschytz,et al.  Practical fast 1-D DCT algorithms with 11 multiplications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[19]  Gerlind Plonka,et al.  A global method for invertible integer DCT and integer wavelet algorithms , 2004 .

[20]  O. Gustafsson,et al.  Improved multiple constant multiplication using a minimum spanning tree , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[21]  John H. Cozzens,et al.  Computing the discrete Fourier transform using residue number systems in a ring of algebraic integers , 1985, IEEE Trans. Inf. Theory.

[22]  Jianqin Zhou,et al.  On discrete cosine transform , 2011, ArXiv.

[23]  Graham A. Jullien,et al.  New Encoding of 8×8 DCT to make H.264 Lossless , 2006, APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems.

[24]  Arjuna Madanayake,et al.  A Single-Channel Architecture for Algebraic Integer Based 8×8 2-D DCT Computation , 2017, ArXiv.

[25]  Arjuna Madanayake,et al.  Algebraic integer based 8×8 2-D DCT architecture for digital video processing , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[26]  A NikaraJari,et al.  Discrete cosine and sine transforms , 2006 .

[27]  Khan Wahid,et al.  ON THE ERROR-FREE COMPUTATION OF FAST COSINE TRANSFORM , 2006 .

[28]  Graham A. Jullien,et al.  Multiplication-free 8×8 2D DCT architecture using algebraic integer encoding , 2004 .

[29]  R. Cintra,et al.  Image Compression via a Fast DCT Approximation , 2010, IEEE Latin America Transactions.

[30]  John H. Cozzens,et al.  Range and error analysis for a fast Fourier transform computed over Z[{omega}] , 1987, IEEE Trans. Inf. Theory.

[31]  Oscar Gustafsson,et al.  Optimization of AIQ representations for low complexity wavelet transforms , 2011, 2011 20th European Conference on Circuit Theory and Design (ECCTD).

[32]  Arjuna Madanayake,et al.  A Single-Channel Architecture for Algebraic Integer-Based 8 $\,\times\,$8 2-D DCT Computation , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Nicolas Boullis,et al.  Some optimizations of hardware multiplication by constant matrices , 2005, IEEE Transactions on Computers.

[34]  Y. Arai,et al.  A Fast DCT-SQ Scheme for Images , 1988 .

[35]  P. Duhamel,et al.  New 2nDCT algorithms suitable for VLSI implementation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Chunyan Wang,et al.  A computation structure for 2-D DCT watermarking , 2009, 2009 52nd IEEE International Midwest Symposium on Circuits and Systems.

[37]  Trio Adiono,et al.  VLSI design of a high-throughput discrete cosine transform for image compression systems , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[38]  Arjuna Madanayake,et al.  Asynchronous Realization of Algebraic Integer-Based 2D DCT Using Achronix Speedster SPD60 FPGA , 2013, J. Electr. Comput. Eng..