Memory access overhead reduction for a digital color copier implementation using a VLIW digital signal processor

A block based implementation of a digital color copier algorithm on TMS 320C64x, VLIW DSP, is conducted in order to reduce the overhead of memory accesses. We developed two strategies, one is the whole block caching and the other is the partial block progressive caching methods. The former chooses a block size that can be fully accommodated by the L1 data cache so that no conflict or capacitive cache miss occurs, while the latter keeps the data only needed for processing a single or dual lines of the output image for maximizing the line-length. It is shown that the blocking reduces the cache misses but increases the overhead of software pipelining because of the reduced loop lengths. The implementation results showing the respective overheads, such as cache misses, software pipelining, and DMA, are presented to guide the optimum block size selection.

[1]  Wonyong Sung,et al.  Implementation of a digital color copier using a VLIW SIMD architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Wonyong Sung,et al.  Multimedia processor-based implementation of an error-diffusion halftoning algorithm exploiting subword parallelism , 2001, IEEE Trans. Circuits Syst. Video Technol..

[3]  P. Groves,et al.  A 600 MHz VLIW DSP , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[4]  Yongjian Hu,et al.  An algorithm for removable visible watermarking , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .