An Efficient SIMD Architecture with Parallel Memory for 2D Cosine Transforms of Video Coding

This paper proposes an efficient SIMD architecture with parallel memory for 2D cosine transforms of multiple video standards. A novel parallel memory scheme is employed to provide conflict-free parallel access in both horizontal and vertical directions with the successive or even/odd mode, as well as to eliminate data permutation and matrix transposition. Furthermore, application specific instructions are presented to accelerate the transform kernels, such as butterfly and rotate operations with scaling, rounding and clipping. The simulation results show that proposed architecture achieves significant performance improvement with low hardware cost of 3.2 K equivalent gate count for parallel memory subsystem (not including SRAMs) and 19.8 K for arithmetic units@250 MHz in 0.18 mum process.

[1]  M. Boukadoum,et al.  A portable multi-band optoelectronic system for identifying and measuring the concentration of fluorophore substances , 2004, The 2nd Annual IEEE Northeast Workshop on Circuits and Systems, 2004. NEWCAS 2004..

[2]  Moinul H. Khan,et al.  Optimizing mobile multimedia using SIMD techniques , 2006, Multimedia Tools and Applications.

[3]  Di Wu,et al.  A Single Scalar DSP based Programmable H.264 Decoder , 2005 .

[4]  Jarkko Niittylahti,et al.  Byte and modulo addressable parallel memory architecture for video coding , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Peter Pirsch,et al.  Multicore system-on-chip architecture for MPEG-4 streaming video , 2002, IEEE Trans. Circuits Syst. Video Technol..

[6]  Thomas Sikora,et al.  Trends and Perspectives in Image and Video Coding , 2005, Proceedings of the IEEE.

[7]  Mathias Wien,et al.  Variable block-size transforms for H.264/AVC , 2003, IEEE Trans. Circuits Syst. Video Technol..

[8]  Henrique S. Malvar,et al.  Low-complexity transform and quantization in H.264/AVC , 2003, IEEE Trans. Circuits Syst. Video Technol..

[9]  S. Sriram,et al.  MPEG-2 video decoding on the TMS320C6X DSP architecture , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[10]  Myung Hoon Sunwoo,et al.  ASIP approach for implementation of H.264/AVC , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[11]  A. Barwicz,et al.  Functional and technological integration of measurement microsystems , 2004, IEEE Instrumentation & Measurement Magazine.

[12]  Xiaolang Yan,et al.  A SIMD Video Signal Processor with Efficient Data Organization , 2006, 2006 IEEE Asian Solid-State Circuits Conference.

[13]  J P Landers,et al.  Capillary electrophoresis with laser-induced fluorescence detection for the analysis of free and immune-complexed green fluorescent protein. , 1997, Analytical biochemistry.

[14]  G.S. Moschytz,et al.  Practical fast 1-D DCT algorithms with 11 multiplications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[15]  Stamatis Vassiliadis,et al.  Instruction set architecture enhancements for video processing , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[16]  Ichiro Kuroda,et al.  V830R/AV: embedded multimedia superscalar RISC processor , 1998, IEEE Micro.

[17]  Marc Madou,et al.  Scaling issues in chemical and biological sensors , 2003, Proc. IEEE.

[18]  C. Albano,et al.  On-line green fluorescent protein sensor with LED excitation. , 1997, Biotechnology and bioengineering.

[19]  A. Chang,et al.  Enhanced detection of live bacteria using a dendrimer thin film in an optical biosensor. , 2001, Analytical chemistry.

[20]  Lizy Kurian John,et al.  Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements , 2003, IEEE Trans. Computers.

[21]  Christina P. Bacon,et al.  Miniature spectroscopic instrumentation: Applications to biology and chemistry , 2004 .

[22]  C. Albano,et al.  All solid-state GFP sensor. , 2000, Biotechnology and bioengineering.

[23]  Reiner Creutzburg,et al.  On Design of Parallel Memory Access Schemes for Video Coding , 2005, J. VLSI Signal Process..

[24]  A. Deisingh,et al.  Biosensors for the detection of bacteria. , 2004, Canadian journal of microbiology.

[25]  N. Dovichi,et al.  Detection of Aequorea victoria green fluorescent protein by capillary electrophoresis laser induced fluorescence detection. , 1997, Biomedical chromatography : BMC.

[26]  M. Boukadoum,et al.  Comparison of the noise immunity of a LED-based multiband optoelectronic sensor when using FDMA and CDMA to code the excitation source , 2004, The 2004 IEEE Asia-Pacific Conference on Circuits and Systems, 2004. Proceedings..

[27]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[28]  Shankar Regunathan,et al.  An overview of VC-1 , 2005, Visual Communications and Image Processing.

[29]  Mounir Boukadoum,et al.  FPGA implementation of a CDMA source coding and modulation subsystem for a multiband fluorometer with pattern recognition capabilities , 2005, 2005 IEEE International Symposium on Circuits and Systems.