Energy-efficient Hardware Accelerators for the SA-DCT and Its Inverse

The explosive growth of the mobile multimedia industry has accentuated the need for efficient VLSI implementations of the associated computationally demanding signal processing algorithms. In particular, the short battery life caused by excessive power consumption of mobile devices has become the biggest obstacle facing truly mobile multimedia. We propose novel hardware accelerator architectures for two of the most computationally demanding algorithms of the MPEG-4 video compression standard––the forward and inverse shape adaptive discrete cosine transforms (SA-DCT/IDCT). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art.

[1]  Thomas Sikora,et al.  Shape-adaptive DCT for generic coding of video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[2]  Noel Brady MPEG-4 standardized methods for the compression of arbitrarily shaped video objects , 1999, IEEE Trans. Circuits Syst. Video Technol..

[3]  A. Chandrakasan,et al.  A low-power DCT core using adaptive bitwidth and arithmetic activity exploiting signal correlations and quantization , 1999, IEEE Journal of Solid-State Circuits.

[4]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[5]  Jiun-In Guo,et al.  A new 2-D 8/spl times/8 DCT/IDT core design using group distributed arithmetic , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[6]  Jiun-In Guo,et al.  An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Ephraim Feig,et al.  Fast algorithms for the discrete cosine transform , 1992, IEEE Trans. Signal Process..

[8]  Amar Aggoun,et al.  Two-dimensional DCT/IDCT architecture , 2003 .

[9]  Noel E. O'Connor,et al.  Optimisation of Constant Matrix Multiplication Operation Hardware Using a Genetic Algorithm , 2006, EvoWorkshops.

[10]  Jinsang Kim,et al.  Low-power multiplierless DCT architecture using image correlation , 2004, IEEE Trans. Consumer Electron..

[11]  Alan N. Willson,et al.  A 100 MHz 2-D 8×8 DCT/IDCT processor for HDTV applications , 1995, IEEE Trans. Circuits Syst. Video Technol..

[12]  Mohammed Ghanbari,et al.  Standard Codecs: Image Compression to Advanced Video Coding , 2003 .

[13]  Chein-Wei Jen,et al.  A cost-effective MPEG-4 shape-adaptive DCT with auto-aligned transpose memory organization , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[14]  Jiun-In Guo,et al.  A power-aware IP core design for the variable-length DCT/IDCT targeting at MPEG4 shape-adaptive transforms , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[15]  Chein-Wei Jen,et al.  A memory-efficient realization of cyclic convolution and its application to discrete cosine transform , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  T.S. Mohamed,et al.  Integrated hardware-software platform for image processing applications , 2004, 4th IEEE International Workshop on System-on-Chip for Real-Time Applications.

[17]  Noel E. O'Connor,et al.  FPGA-based conformance testing and system prototyping of an MPEG-4 SA-DCT hardware accelerator , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[18]  Thomas Sikora Digital video coding standards , 1997 .

[19]  Chein-Wei Jen,et al.  A simple processor core design for DCT/IDCT , 2000, IEEE Trans. Circuits Syst. Video Technol..

[20]  Thomas Sikora,et al.  Trends and Perspectives in Image and Video Coding , 2005, Proceedings of the IEEE.

[21]  Douglas A. Pucknell,et al.  Basic VLSI Design , 1987 .

[22]  Liang-Gee Chen,et al.  Efficient algorithms and architectures for MPEG-4 object-based video coding , 2000, 2000 IEEE Workshop on SiGNAL PROCESSING SYSTEMS. SiPS 2000. Design and Implementation (Cat. No.00TH8528).

[23]  Liang-Gee Chen,et al.  Advances in Hardware Architectures for Image and Video Coding - A Survey , 2005, Proc. IEEE.

[24]  Peter Kuhn,et al.  Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation , 1999, Springer US.

[25]  Jiun-In Guo,et al.  An Energy-Aware IP Core Design for the Variable-Length DCT/IDCT Targeting at MPEG4 Shape-Adaptive Transforms , 2005, IEEE Trans. Circuits Syst. Video Technol..

[26]  Miodrag Potkonjak,et al.  Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common subexpression elimination , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[27]  Wael M. Badawy,et al.  A new time distributed DCT architecture for MPEG-4 hardware reference model , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Wen-Hsiung Chen,et al.  A Fast Computational Algorithm for the Discrete Cosine Transform , 1977, IEEE Trans. Commun..

[29]  Thomas Sikora,et al.  The MPEG-4 video standard verification model , 1997, IEEE Trans. Circuits Syst. Video Technol..

[30]  Liang-Gee Chen,et al.  Reconfigurable discrete cosine transform processor for object-based video signal processing , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[31]  Paul E. Landman,et al.  Low-power architectural design methodologies , 1995 .

[32]  Noel E. O'Connor,et al.  Efficient hardware architectures for MPEG-4 core profile , 2005 .

[33]  Wayne Luk,et al.  Static and Dynamic Reconfigurable Designs for a 2D Shape-Adaptive DCT , 2000, FPL.

[34]  Jiun-In Guo,et al.  A parameterized power-aware IP core generator for the 2-D 8/spl times/8 DCT/IDCT , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[35]  Israel Koren Computer arithmetic algorithms , 1993 .

[36]  Chris J. Bleakley,et al.  Power Consumption Characterisation of the Texas Instruments TMS320VC5510 DSP , 2005, PATMOS.

[37]  Gary K. Yeap,et al.  Practical Low Power Digital VLSI Design , 1997 .

[38]  Peter Pirsch,et al.  VLSI architectures for video compression-a survey , 1995, Proc. IEEE.

[39]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Noel E. O'Connor,et al.  Energy-Efficient Hardware Architecture for Variable N-point 1D DCT , 2004, PATMOS.

[41]  King Ngi Ngan,et al.  Face segmentation using skin-color map in videophone applications , 1999, IEEE Trans. Circuits Syst. Video Technol..

[42]  Nathan Ickes,et al.  Instruction level and operating system profiling for energy exposed software , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[43]  Jinsang Kim,et al.  A VLSI architecture for video-object segmentation , 2003, IEEE Trans. Circuits Syst. Video Technol..

[44]  G.S. Moschytz,et al.  Practical fast 1-D DCT algorithms with 11 multiplications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[45]  Tian-Sheuan Chang,et al.  An MPEG-4 shape-adaptive inverse DCT with zero skipping and auto-aligned transpose memory , 2004, The 2004 IEEE Asia-Pacific Conference on Circuits and Systems, 2004. Proceedings..

[46]  Thomas Sikora,et al.  MPEG digital video-coding standards , 1997, IEEE Signal Process. Mag..

[47]  Anantha Chandrakasan,et al.  JouleTrack: a web based tool for software energy profiling , 2001, DAC '01.

[48]  Graham A. Jullien,et al.  Multidimensional algebraic-integer encoding for high performance implementation of DCT and IDCT , 2003 .

[49]  Keshab K. Parhi,et al.  Digital Signal Processing for Multimedia Systems , 1999 .

[50]  Jörn Gause Reconfigurable computing for shape-adaptive video processing , 2002 .

[51]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[52]  Liang-Gee Chen,et al.  Single chip video segmentation system with a programmable PE array , 2002, Proceedings. IEEE Asia-Pacific Conference on ASIC,.

[53]  Manfred Glesner,et al.  Flexible architectures for DCT of variable-length targeting shape-adaptive transform , 2000, IEEE Trans. Circuits Syst. Video Technol..

[54]  B. Lee A new algorithm to compute the discrete cosine Transform , 1984 .

[55]  Lap-Pui Chau,et al.  Efficient implementation of discrete cosine transform using recursive filter structure , 1994, IEEE Trans. Circuits Syst. Video Technol..

[56]  Liang-Gee Chen,et al.  Nearly Lossless Content-Dependent Low-Power DCT Design for Mobile Video Applications , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[57]  Y. Arai,et al.  A Fast DCT-SQ Scheme for Images , 1988 .

[58]  Tsong Yueh Chen,et al.  Combining static and dynamic features using neural networks and edge fusion for video object extraction , 2003 .