Energy efficient hardware acceleration of multimedia processing tools

The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings.

[1]  T.S. Mohamed,et al.  Integrated hardware-software platform for image processing applications , 2004, 4th IEEE International Workshop on System-on-Chip for Real-Time Applications.

[2]  Josef F. Huber Mobile next-generation networks , 2004, IEEE MultiMedia.

[3]  Noel E. O'Connor,et al.  FPGA-based conformance testing and system prototyping of an MPEG-4 SA-DCT hardware accelerator , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[4]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[5]  Jong-Seog Koh,et al.  An area efficient DCT architecture for MPEG-2 video encoder , 1999, IEEE Trans. Consumer Electron..

[6]  Linda Dailey Paulson Low-Power Chips for High-Powered Handhelds , 2003, Computer.

[7]  Noel E. O'Connor,et al.  Optimisation of Constant Matrix Multiplication Operation Hardware Using a Genetic Algorithm , 2006, EvoWorkshops.

[8]  Noel Brady MPEG-4 standardized methods for the compression of arbitrarily shaped video objects , 1999, IEEE Trans. Circuits Syst. Video Technol..

[9]  S.A. White,et al.  Applications of distributed arithmetic to digital signal processing: a tutorial review , 1989, IEEE ASSP Magazine.

[10]  Farid N. Najm,et al.  A survey of power estimation techniques in VLSI circuits , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[11]  Jiun-In Guo,et al.  An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Jar-Ferr Yang,et al.  Efficient recursive structures for forward and inverse discrete cosine transform , 2004, IEEE Trans. Signal Process..

[13]  Yuan Taur,et al.  Device scaling limits of Si MOSFETs and their application dependencies , 2001, Proc. IEEE.

[14]  Michael T. Orchard,et al.  A comparative study of DCT- and wavelet-based image coding , 1999, IEEE Trans. Circuits Syst. Video Technol..

[15]  Chein-Wei Jen,et al.  A simple processor core design for DCT/IDCT , 2000, IEEE Trans. Circuits Syst. Video Technol..

[16]  Peter Kauff,et al.  Functional coding of video using a shape-adaptive DCT algorithm and an object-based motion prediction toolbox , 1997, IEEE Trans. Circuits Syst. Video Technol..

[17]  H. Samueli,et al.  An improved search algorithm for the design of multiplierless FIR filters with powers-of-two coefficients , 1989 .

[18]  Anantha P. Chandrakasan,et al.  Low Power Digital CMOS Design , 1995 .

[19]  Dong Sam Ha,et al.  Low power design of DCT and IDCT for low bit rate video codecs , 2004, IEEE Transactions on Multimedia.

[20]  S. Kuiper,et al.  Through a lens sharply [FluidFocus lens] , 2004, IEEE Spectrum.

[21]  Ephraim Feig,et al.  Fast algorithms for the discrete cosine transform , 1992, IEEE Trans. Signal Process..

[22]  Shen-Fu Hsiao,et al.  A cost-efficient and fully-pipelinable architecture for DCT/IDCT , 1999, IEEE Trans. Consumer Electron..

[23]  Noel E. O'Connor,et al.  Towards an optimised VLSI design algorithm for the constant matrix multiplication problem , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[24]  Graham A. Jullien,et al.  A new DCT algorithm based on encoding algebraic integers , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[25]  Nathan Ickes,et al.  Instruction level and operating system profiling for energy exposed software , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[26]  R. A. Powers Batteries for low power electronics , 1995, Proc. IEEE.

[27]  Jinsang Kim,et al.  A VLSI architecture for video-object segmentation , 2003, IEEE Trans. Circuits Syst. Video Technol..

[28]  Earl E. Swartzlander,et al.  DCT Implementation with Distributed Arithmetic , 2001, IEEE Trans. Computers.

[29]  R. Hartley Optimization of canonic signed digit multipliers for filter design , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[30]  Jörn Gause Reconfigurable computing for shape-adaptive video processing , 2002 .

[31]  Aravind Dasu,et al.  A survey of media processing approaches , 2002, IEEE Trans. Circuits Syst. Video Technol..

[32]  Jiun-In Guo,et al.  A parameterized power-aware IP core generator for the 2-D 8/spl times/8 DCT/IDCT , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[33]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Braden Phillips,et al.  Minimal weight digit set conversions , 2004, IEEE Transactions on Computers.

[35]  Shi Peng-fei,et al.  Object-based watermarking scheme robust to object manipulations , 2002 .

[36]  Ieee Standards Board,et al.  IEEE standard specifications for the implementations of 8x8 inverse discrete cosine transform , 1991 .

[37]  Liang-Gee Chen,et al.  Nearly Lossless Content-Dependent Low-Power DCT Design for Mobile Video Applications , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[38]  Hussein M. Alnuweiri,et al.  A new multidimensional recursive architecture for computing the discrete cosine transform , 2000, IEEE Trans. Circuits Syst. Video Technol..

[39]  Richard I. Hartley,et al.  Tree-height minimization in pipelined architectures , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[40]  Naehyuck Chang,et al.  Cycle-accurate energy consumption measurement and analysis: case study of ARM7TDMI , 2000, ISLPED '00.

[41]  Takao Onoye,et al.  VLSI implementation of inverse discrete cosine transformer and motion compensator for MPEG2 HDTV video decoding , 1995, IEEE Trans. Circuits Syst. Video Technol..

[42]  Dhananjay S. Phatak,et al.  Hybrid Signed-Digit Number Systems: A Unified Framework for Redundant Number Representations With Bounded Carry Propagation Chains , 1994, IEEE Trans. Computers.

[43]  Noel E. O'Connor,et al.  Low Power Techniques for Video Compression , 2002 .

[44]  Ming-Ting Sun,et al.  Modeling DCT coefficients for fast video encoding , 1999, IEEE Trans. Circuits Syst. Video Technol..

[45]  Manfred Glesner,et al.  On the design of a novel architecture for shape-adaptive DCT targeting image coding , 1999 .

[46]  M. Yukishita,et al.  An efficient hierarchical clustering method for the multiple constant multiplication problem , 1997, Proceedings of ASP-DAC '97: Asia and South Pacific Design Automation Conference.

[47]  Yu Hen Hu,et al.  Efficient VLSI implementations of fast multiplierless approximated DCT using parameterized hardware modules for silicon intellectual property design , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[48]  Stephen Bique New characterizations of 2D discrete cosine transform , 2005, IEEE Transactions on Computers.

[49]  Nicolas Boullis,et al.  Some optimizations of hardware multiplication by constant matrices , 2005, IEEE Transactions on Computers.

[50]  Anantha Chandrakasan,et al.  Architectural exploration using Verilog-based power estimation: a case study of the IDCT , 1997, DAC.

[51]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[52]  Anantha P. Chandrakasan,et al.  A low-power IDCT macrocell for MPEG-2 MP@ML exploiting data distribution properties for minimal activity , 1999 .

[53]  A. Prasad Vinod,et al.  Comparison of the horizontal and the vertical common subexpression elimination methods for realizing digital filters , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[54]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[55]  Andrew G. Dempster,et al.  Generation of signed-digit representations for integer multiplication , 2004, IEEE Signal Processing Letters.

[56]  Luca Fanucci,et al.  Data driven VLSI computation for low power DCT-based video coding , 2002, 9th International Conference on Electronics, Circuits and Systems.

[57]  Manfred Glesner,et al.  Flexible architectures for DCT of variable-length targeting shape-adaptive transform , 2000, IEEE Trans. Circuits Syst. Video Technol..

[58]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[59]  B. Lee A new algorithm to compute the discrete cosine Transform , 1984 .

[60]  Jooheung Lee,et al.  Efficient VLSI implementation of inverse discrete cosine transform [image coding applications] , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61]  Lap-Pui Chau,et al.  Efficient implementation of discrete cosine transform using recursive filter structure , 1994, IEEE Trans. Circuits Syst. Video Technol..

[62]  Yeong-Kang Lai,et al.  A cost-effective 2-D discrete cosine transform processor with reconfigurable datapath , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[63]  Wayne H. Wolf,et al.  Multiprocessor Systems-on-Chips , 2004, ISVLSI.

[64]  Masahiko Yoshimoto,et al.  A 100-MHz 2-D discrete cosine transform core processor , 1992 .

[65]  Sungho Kang,et al.  New distributed arithmetic algorithm for low-power FIR filter implementation , 2004, IEEE Signal Process. Lett..

[66]  Tsong Yueh Chen,et al.  Combining static and dynamic features using neural networks and edge fusion for video object extraction , 2003 .

[67]  Trevor N. Mudge,et al.  Power: A First-Class Architectural Design Constraint , 2001, Computer.

[68]  Chein-Wei Jen,et al.  A memory-efficient realization of cyclic convolution and its application to discrete cosine transform , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[69]  Wael M. Badawy,et al.  A new time distributed DCT architecture for MPEG-4 hardware reference model , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[70]  Thomas Engelhardt,et al.  Coding of arbitrarily shaped image segments based on a generalized orthogonal transform , 1989, Signal Process. Image Commun..

[71]  Andrew G. Dempster,et al.  Using all signed-digit representations to design single integer multipliers using subexpression elimination , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[72]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[73]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[74]  O. Gustafsson,et al.  Comparison of graphical and subexpression methods for design of efficient multipliers , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[75]  Naehyuck Chang,et al.  Cycle-accurate energy measurement and characterization with a case study of the ARM7TDMI [microprocessors] , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[76]  R. Hartley Subexpression sharing in filters using canonic signed digit multipliers , 1996 .

[77]  Jinsang Kim,et al.  Low-power multiplierless DCT architecture using image correlation , 2004, IEEE Trans. Consumer Electron..

[78]  Alex Pentland,et al.  A New Sense for Depth of Field , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Andrew G. Dempster,et al.  Multiplierless FIR filter design algorithms , 2005, IEEE Signal Processing Letters.

[80]  Liang-Gee Chen,et al.  Performance analysis and architecture evaluation of MPEG-4 video codec system , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[81]  Steven B. Smith,et al.  Digital Signal Processing: A Practical Guide for Engineers and Scientists , 2002 .

[82]  A. Dempster,et al.  Common subexpression elimination algorithm for low-cost multiplierless implementation of matrix multipliers , 2004 .

[83]  L. Brush,et al.  Market trends in smart battery packs for portable electronics , 1998, Thirteenth Annual Battery Conference on Applications and Advances. Proceedings of the Conference.

[84]  Chein-Wei Jen,et al.  A cost-effective MPEG-4 shape-adaptive DCT with auto-aligned transpose memory organization , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[85]  Gurindar S. Sohi,et al.  A static power model for architects , 2000, MICRO 33.

[86]  Alan N. Willson,et al.  The design of low-complexity in linear-phase FIR filter banks using powers-of-two coefficients with an application to subband image coding , 1991, IEEE Trans. Circuits Syst. Video Technol..

[87]  Thomas Sikora,et al.  Efficiency of shape-adaptive 2-D transforms for coding of arbitrarily shaped image segments , 1995, IEEE Trans. Circuits Syst. Video Technol..

[88]  Noel E. O'Connor,et al.  Efficient hardware architectures for MPEG-4 core profile , 2005 .

[89]  Graham A. Jullien,et al.  Multidimensional algebraic-integer encoding for high performance implementation of DCT and IDCT , 2003 .

[90]  Zhenyang Wu,et al.  An efficient CORDIC array structure for the implementation of discrete cosine transform , 1995, IEEE Transactions on Signal Processing.

[91]  Janusz Konrad,et al.  Fractal image compression with region-based functionality , 2002, IEEE Trans. Image Process..

[92]  George Lawton,et al.  New Technologies Place Video in Your Hand , 2001, Computer.

[93]  Jan M. Rabaey,et al.  Digital Integrated Circuits: A Design Perspective , 1995 .

[94]  Karen O. Egiazarian,et al.  POINTWISE SHAPE-ADAPTIVE DCT AS AN OVERCOMPLETE DENOISING TOOL , 2005 .

[95]  Keshab K. Parhi,et al.  A novel systolic array structure for DCT , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[96]  Ville Lappalainen,et al.  Overview of research efforts on media ISA extensions and their usage in video coding , 2002, IEEE Trans. Circuits Syst. Video Technol..

[97]  A. Chandrakasan,et al.  A low-power DCT core using adaptive bitwidth and arithmetic activity exploiting signal correlations and quantization , 1999, IEEE Journal of Solid-State Circuits.

[98]  Mohammad Dastbaz,et al.  Issues in delivering multimedia content to mobile devices , 2002, Proceedings Sixth International Conference on Information Visualisation.

[99]  In-Cheol Park,et al.  Digital filter synthesis based on an algorithm to generate all minimal signed digit representations , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[100]  Mark Weiser The computer for the 21st century , 1991 .

[101]  Valentin Muresan,et al.  An optimal adder-based hardware architecture for the DCT/SA-DCT , 2005, Visual Communications and Image Processing.

[102]  Ming-Ting Sun,et al.  Approximation of calculations for forward discrete cosine transform , 1998, IEEE Trans. Circuits Syst. Video Technol..

[103]  Leonard McMillan,et al.  A forward-mapping realization of the inverse discrete cosine transform , 1992, Data Compression Conference, 1992..

[104]  A. Prasad Vinod,et al.  On the implementation of efficient channel filters for wideband receivers by optimizing common subexpression elimination methods , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[105]  Andrew G. Dempster,et al.  Digital filter design using subexpression elimination and all signed-digit representations , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[106]  Geoffrey C.-F. Yeap,et al.  Leakage current in low standby power and high performance devices: trends and challenges , 2002, ISPD '02.

[107]  Majid Ahmadi,et al.  A low-power DCT IP core based on 2D algebraic integer encoding , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[108]  Patrick Schaumont,et al.  A new algorithm for elimination of common subexpressions , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[109]  Lee-Sup Kim,et al.  A MODE-CHANGEABLE 2-D DCT/IDCT PnOCESSOR FOR DIGITAL VCR , 1996 .

[110]  Shen-Fu Hsiao,et al.  A new hardware-efficient algorithm and architecture for computation of 2-D DCTs on a linear array , 2001, IEEE Trans. Circuits Syst. Video Technol..

[111]  A. Prasad Vinod,et al.  An efficient coefficient-partitioning algorithm for realizing low-complexity digital filters , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[112]  Feng Zhou,et al.  High speed DCT/IDCT using a pipelined CORDIC algorithm , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[113]  Wen-Hsiung Chen,et al.  A Fast Computational Algorithm for the Discrete Cosine Transform , 1977, IEEE Trans. Commun..