CORDIC based fast algorithm for power-of-two point DCT and its efficient VLSI implementation

-In this paper, we present a coordinate rotation digital computer (CORDIC) based fast algorithm for power-of-two point DCT, and develop its corresponding efficient VLSI implementation. The proposed algorithm has some distinguish advantages, such as regular Cooley-Tukey FFT-like data flow, identical post-scaling factor, and arithmetic-sequence rotation angles. By using the trigonometric formula, the number of the CORDIC types is reduced dramatically. This leads to an efficient method for overcoming the problem that lack synchronization among the various rotation angles CORDICs. By fully reusing the uniform processing cell (PE), for 8-point DCT, only four carry save adders (CSAs)-based PEs with two different types are required. Compared with other known architectures, the proposed 8-point DCT architecture has higher modularity, lower hardware complexity, higher throughput and better synchronization.

[1]  Shanq-Jang Ruan,et al.  Low-power and high-quality Cordic-based Loeffler DCT for signal processing , 2007, IET Circuits Devices Syst..

[2]  Earl E. Swartzlander,et al.  A scaled DCT architecture with the CORDIC algorithm , 2002, IEEE Trans. Signal Process..

[3]  Chi-Wah Kok,et al.  Fast algorithm for computing discrete cosine transform , 1997, IEEE Trans. Signal Process..

[4]  Chih-Peng Fan,et al.  Compact recursive structures for discrete cosine transform , 2000 .

[5]  Jean-Marie Moureaux,et al.  Design and performance analysis of a zonal DCT-based image encoder for Wireless Camera Sensor Networks , 2012, Microelectron. J..

[6]  Jie Chen,et al.  A complete pipelined parallel CORDIC architecture for motion estimation , 1998 .

[7]  Peter A. Beerel,et al.  Efficient asynchronous bundled-data pipelines for DCT matrix-vector multiplication , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Yu Hen Hu,et al.  Efficient VLSI implementations of fast multiplierless approximated DCT using parameterized hardware modules for silicon intellectual property design , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[9]  Liyi Xiao,et al.  CORDIC Based Fast Radix-2 DCT Algorithm , 2013, IEEE Signal Processing Letters.

[10]  Henrique S. Malvar Fast computation of discrete cosine transform through fast Hartley transform , 1986 .

[11]  Liang-Gee Chen,et al.  High throughput CORDIC-based systolic array design for the discrete cosine transform , 1995, IEEE Trans. Circuits Syst. Video Technol..

[12]  Zhongfeng Wang,et al.  An improved scaled DCT architecture , 2009, IEEE Transactions on Consumer Electronics.

[13]  Zhenyang Wu,et al.  An efficient CORDIC array structure for the implementation of discrete cosine transform , 1995, IEEE Transactions on Signal Processing.

[14]  Luca Fanucci,et al.  Parametrized and reusable VLSI macro cells for the low-power realization of 2-D discrete-cosine-transform , 2001 .

[15]  Jiun-In Guo,et al.  A generalized architecture for the one-dimensional discrete cosine and sine transforms , 2001, IEEE Trans. Circuits Syst. Video Technol..

[16]  Jar-Ferr Yang,et al.  Direct recursive structures for computing radix-r two-dimensional DCT/IDCT/DST/IDST , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[17]  Pramod Kumar Meher Systolic Designs for DCT Using a Low-Complexity Concurrent Convolutional Formulation , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[19]  Min-Woo Lee,et al.  Reconfigurable CORDIC-Based Low-Power DCT Architecture Based on Data Priority , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[20]  Chin-Teng Lin,et al.  Cost-Effective Triple-Mode Reconfigurable Pipeline FFT/IFFT/2-D DCT Processor , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Hai Huang,et al.  A novel VLSI linear array for 2-D DCT/IDCT , 2010, 2010 3rd International Congress on Image and Signal Processing.

[22]  Guoan Bi,et al.  DCT algorithms for composite sequence lengths , 1998, IEEE Trans. Signal Process..

[23]  G.S. Moschytz,et al.  Practical fast 1-D DCT algorithms with 11 multiplications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[24]  Chi-Min Liu,et al.  Fast Radix- $q$ and Mixed-Radix Algorithms for Type-IV DCT , 2008, IEEE Signal Processing Letters.

[25]  Sung Bum Pan,et al.  Unified systolic arrays for computation of the DCT/DST/DHT , 1997, IEEE Trans. Circuits Syst. Video Technol..

[26]  Shih-Chang Hsia,et al.  Shift-Register-Based Data Transposition for Cost-Effective Discrete Cosine Transform , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[27]  M. J. Narasimha,et al.  On the Computation of the Discrete Cosine Transform , 1978, IEEE Trans. Commun..

[28]  Hsieh S. Hou A fast recursive algorithm for computing the discrete cosine transform , 1987, IEEE Trans. Acoust. Speech Signal Process..

[29]  Hai Huang,et al.  A novel CORDIC based unified architecture for DCT and IDCT , 2012, 2012 International Conference on Optoelectronics and Microelectronics.

[30]  B. Lee A new algorithm to compute the discrete cosine Transform , 1984 .

[31]  Earl E. Swartzlander,et al.  DCT Implementation with Distributed Arithmetic , 2001, IEEE Trans. Computers.

[32]  Shen-Fu Hsiao,et al.  A new hardware-efficient algorithm and architecture for computation of 2-D DCTs on a linear array , 2001, IEEE Trans. Circuits Syst. Video Technol..