Memory Sub-System Optimization on a SIMD Video Signal Processor for Multi-Standard CODEC

Video has become a key multimedia application in embedded systems and various standards have been developed for specific purposes. As a result, high performance and flexible functionality are required to design embedded systems for video CODEC. SIMD extension is well known as a representative approach to overcome performance bottlenecks of programmable processors, especially in the multimedia operations. This paper proposes a novel linear SIMD processing array with an intelligent local memory structure and its associated software optimization for video decoding. An entire evaluation, including component design, system integration, and cycle accurate simulation is accomplished by a system-level SoC design tool. Compared to conventional SIMD approaches, the proposed method can reduce the execution cycle by approximately 25%.

[1]  Huy Nguyen,et al.  AltiVec/sup TM/: bringing vector technology to the PowerPC/sup TM/ processor family , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[2]  Lizy Kurian John,et al.  Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology , 1999, ICS '99.

[3]  Vladimir M. Pentkovski,et al.  Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.

[4]  Yongdong Zhang,et al.  High throughput and low memory access sub-pixel interpolation architecture for H.264/AVC HDTV decoder , 2005, IEEE Transactions on Consumer Electronics.

[5]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[6]  Ruby B. Lee,et al.  PLX: An Instruction Set Architecture and Testbed for Multimedia Information Processing , 2005, J. VLSI Signal Process..

[7]  Ruby B. Lee,et al.  Fast subword permutation instructions using omega and flip network stages , 2000, Proceedings 2000 International Conference on Computer Design.

[8]  Wonyong Sung,et al.  H.264 decoder optimization exploiting SIMD instructions , 2004, The 2004 IEEE Asia-Pacific Conference on Circuits and Systems, 2004. Proceedings..

[9]  Yoichi Yagasaki,et al.  Adaptive MC interpolation for memory access reduction in JVT video coding , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[10]  Reiner Creutzburg,et al.  On Design of Parallel Memory Access Schemes for Video Coding , 2005, J. VLSI Signal Process..

[11]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[12]  E. Salami,et al.  A performance characterization of high definition digital video decoding using H.264/AVC , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..