Balanced prefetching aggressiveness controller for NoC-based multiprocessor

The performance gap between memory hierarchy and processor is a well-known issue and the prefetching approach is often used to minimize this problem. This technique performs a data prefetch in memory and makes it available in the private cache before its request. Thus, as more prefetching transactions are performed (very aggressive prefetching aggressiveness), the miss rate in the first levels of cache tends to be lower. However, a very aggressive prefetching can cause cache pollution, increase network traffic, and thereby degrade the system performance. In a multiprocessors platform, the prefetching of a core could interfere on the operation of others cores since they share resources, such as memory and network bandwidth. A very aggressive prefetching of a core can overload the network connection, increasing the communication which delays the network requests, increasing the penalty in the processor. In this context, this paper presents a Balanced Prefetching Aggressiveness Controller for a multiprocessor platform that minimizes the processor penalty. We tested the proposed controller in a network-based multiprocessor based on the Sparc V8. The results show a reduction of up to 23% and 7% on average in the processor's penalty, 34% in the cache pollution on average, and increase of 30% on prefetching accuracy for concurrent applications when compared to a system with fixed prefetching aggressiveness approach.

[1]  Michel Dubois,et al.  International Conference on Parallel Processing Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 2006 .

[2]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[3]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[4]  Michel Dubois,et al.  1993 International Conference on Parallel Processing Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 1993 .

[5]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[6]  Onur Mutlu,et al.  Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Víctor Viñals,et al.  ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache , 2012, TACO.

[8]  Rodolfo Azevedo,et al.  The ArchC Architecture Description Language and Tools , 2005, International Journal of Parallel Programming.

[9]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[10]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[11]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[12]  Surendra Byna,et al.  Server-Based Data Push Architecture for Multi-Processor Environments , 2007, Journal of Computer Science and Technology.