Study of cache system in video signal processors

Memory system design is especially important for video signal processing, where the video signal processor (VSP) not only requires a lot of data, but also needs a very high bandwidth and low latency. While caches become ubiquitous in modern systems, their performance still falls behind that of the processors. Therefore a number of modifications to traditional caches have emerged: victim cache, stream buffer, data prefetching techniques, etc. However, few people have studied cache memory for VSP. We present a case study based on extensive trace-driven scheduling, which shows that while stream buffer and stride prediction table are very effective for streaming video data, they should be applied in a different way in dedicated VSP with higher degrees of parallelism than in current super-scalar workstation architectures.

[1]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[2]  Jean-Loup Baer,et al.  A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[3]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[4]  Wayne Wolf,et al.  Parallelism analysis of the memory system in single-chip VLIW video signal processors , 1998, Electronic Imaging.

[5]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[6]  Norman P. Jouppi Cache write policies and performance , 1993, ISCA '93.

[7]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[8]  David J. Lilja,et al.  When Caches Aren't Enough: Data Prefetching Techniques , 1997, Computer.

[9]  S. McFarling Program optimization for instruction caches , 1989, ASPLOS 1989.

[10]  Sally A. McKee,et al.  A memory controller for improved performance of streamed computations on symmetric multiprocessors , 1996, Proceedings of International Conference on Parallel Processing.

[11]  Andrew Wolfe,et al.  Available parallelism in video applications , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Janak H. Patel,et al.  Stride directed prefetching in scalar processors , 1992, MICRO 1992.

[13]  Norman P. Jouppi,et al.  Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.

[14]  Michael J. Flynn,et al.  A comparison of hardware prefetching techniques for multimedia benchmarks , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.