Row-column decomposition based 2D transform optimization on subword parallel processors

This paper discusses the row-column decomposition based 2D block transform implementations, in which the matrix transpose plays a crucial role. A subword parallel VLIW processor architecture supporting simultaneous data processing and matrix transpose provides the required functionality for fast and flexible transform implementations. In addition, new instructions are proposed to further speed up the transforms in the H.264/AVC. With the proposed architectural optimizations, a speed-up by 2.7 is achieved for the 2D DCT/IDCT and a speed-up by over two for the transforms in the H.264/AVC, when compared to the sequential implementations.

[1]  Masahiko Yoshimoto,et al.  A Low Power Media Processor Core Performable CIF30 fr/s MPEG4/H26x Video Codec , 2001 .

[2]  Ichiro Kuroda,et al.  V830R/AV: embedded multimedia superscalar RISC processor , 1998, IEEE Micro.

[3]  Ruby B. Lee Multimedia extensions for general-purpose processors , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.

[4]  Zhongde Wang,et al.  Pruning the fast discrete cosine transform , 1991, IEEE Trans. Commun..

[5]  Yongmin Kim,et al.  A register file with transposed access mode , 2000, Proceedings 2000 International Conference on Computer Design.

[6]  Ja-Ling Wu,et al.  MMX-based DCT and MC algorithms for real-time pure software MPEG decoding , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[7]  Jarmo Takala,et al.  Parallel, memory access schemes for H.263 encoder , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[8]  Xiaobo Sharon Hu,et al.  Linear-time matrix transpose algorithms using vector register file with diagonal registers , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[9]  Chris Basoglu,et al.  Computing inverse discrete cosine transform (IDCT) using vector products on a media processor , 1999, Electronic Imaging.

[10]  Henrique S. Malvar,et al.  Low-complexity transform and quantization in H.264/AVC , 2003, IEEE Trans. Circuits Syst. Video Technol..

[11]  T.D. Hamalainen,et al.  Optimization of emerging H.26L video encoder , 2001, 2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578).