Adaptive granularity and coordinated management for timely prefetching in multi-core systems

For the last decade, there have been varying techniques for hardware prefetching to improve the system performance. However, untimely prefetching may pollution caches and resulting into significant performance degradation. In this work, we introduce an Adaptive Granularity and coordinated Prefetching (AGP) that consists of a coarse-grained and fine-grained prefetched mechanism to provide a better caching environment for parallel applications. AGP targets on the degree-adjusting and location-choosing and tries to minimize the influence caused by prefetcher for each core. AGP could produce more timely prefetched requests reducing the cache pollutions and contentions. Across a variety of PARSEC benchmarks, AGP can contribute 6.5% (up to 36%) of performance improvement on a 4-core multicore system compared to the non-prefetching.

[1]  Tien-Fu Chen,et al.  Cross-layer dynamic prefetching allocation strategies for high-performance multicores , 2013, 2013 International Symposium onVLSI Design, Automation, and Test (VLSI-DAT).

[2]  Xian-He Sun,et al.  Timing local streams: improving timeliness in data prefetching , 2010, ICS '10.

[3]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Richard W. Vuduc,et al.  When Prefetching Works, When It Doesn’t, and Why , 2012, TACO.

[5]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[6]  Deyuan Gao,et al.  Global Prefetcher Aggressiveness Control for Chip-Multiprocessor , 2011, 2011 Seventh International Conference on Computational Intelligence and Security.

[7]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[8]  Onur Mutlu,et al.  Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Jack Doweck,et al.  Inside Intel® Core microarchitecture , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).

[10]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[11]  Pedro López,et al.  Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).