An efficient method to implement DCT algorithm on VLIW architectures

In this paper,a method to compute DCT defined in AVS on VLIW DSPs is presented.Complex multiplications are employed to implement the matrix multiplications based on the decomposition of IDCT transform matrix. To reduce register pressure,reuse of packed transform matrix coefficients is also achieved with rational organization of data,so that the proposed method is more suitable for loop unrolling and software pipelining. Eventually a higher ILP is achieved and the computation efficiency is improved. When implemented on VLIW DSPs,the proposed method saves 31. 1% computation time compared with the existing ones. And it is 4. 28 times faster than the fastalgorithm in AVS standard.