A smart cache for improved vector performance

Abstract As the speed of microprocessors increases at a breath-taking rate, the gap between processor and memory system performance is getting worse. To alleviate this problem, all modern processors contain caches, but even using caches, processors cannot achieve their peak performance. We propose a mechanism, smart caching , which extends the power of conventional memory subsystems by including a prefetch unit. This prefetch unit is responsible for efficiently using the available memory bandwidth by fetching memory data before they are actually needed. Prefetching allows high-level application knowledge to increase memory performance, which is currently constraining the performance of most systems. While prefetching does not reduce the latency of memory accesses, it hides this latency by overlapping memory access and instruction execution.