Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays

This paper proposes a context directed pattern matching (CDPM) mechanism, which employs the context of the coarse-grained reconfigurable arrays (CGRAs) as a guide to improve cache prefetching accuracy. CDPM generates a prefetch pattern for an initially executed context, and reuses the pattern to issue prefetch requests when the context is again executed on CGRA. To eliminate the outdated prefetch pattern, CDPM also evaluates the prefetching accuracy of the prefetch pattern at run-time. Experiments showed that CDPM averagely improved performance by 31.1% compared to tests without any prefetching and by 7.7% compared to state-of-the-art prefetching techniques.

[1]  Dong Wang,et al.  An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications , 2013, Proceedings of the IEEE 2013 Custom Integrated Circuits Conference.

[2]  Shih-Lien Lu,et al.  Bloom filtering cache misses for accurate data speculation and prefetching , 2014, ICS 25th Anniversary.

[3]  Russell Tessier,et al.  Reconfigurable Computing Architectures , 2015, Proceedings of the IEEE.

[4]  Abdullah Atalar,et al.  BilRC: An Execution Triggered Coarse Grained Reconfigurable Architecture , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Carole-Jean Wu,et al.  PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[7]  Jari Nurmi,et al.  Design of an accelerator-rich architecture by integrating multiple heterogeneous coarse grain reconfigurable arrays over a network-on-chip , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[8]  Yoav Etsion,et al.  Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[9]  Eberhard Schüler,et al.  Dynamic Reconfiguration for Irregular Code Using FNC-PAE Processor Cores , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[10]  Uri C. Weiser,et al.  Loop-Aware Memory Prefetching Using Code Block Working Sets , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[11]  D. Bhatia,et al.  Reconfigurable computing , 1997, Proceedings Tenth International Conference on VLSI Design.

[12]  Shankar Balachandran,et al.  Introducing Thread Criticality awareness in Prefetcher Aggressiveness Control , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13]  Kei Hiraki,et al.  Unified memory optimizing architecture: memory subsystem control with a unified predictor , 2012, ICS '12.

[14]  Aamer Jaleel,et al.  Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[15]  Shankar Balachandran,et al.  Expert Prefetch Prediction: An Expert Predicting the Usefulness of Hardware Prefetchers , 2016, IEEE Computer Architecture Letters.

[16]  Xin Tong,et al.  RECAP: A region-based cure for the common cold (cache) , 2013, HPCA.

[17]  Roberto Guerrieri,et al.  A Heterogeneous Digital Signal Processor for Dynamically Reconfigurable Computing , 2010, IEEE Journal of Solid-State Circuits.

[18]  Victor Y. Chen,et al.  SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Aviral Shrivastava,et al.  High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Wei Zheng,et al.  Transactions , 2015 .

[21]  J. Uhl Research who? , 1991, Journal of professional nursing : official journal of the American Association of Colleges of Nursing.

[22]  Mahmut T. Kandemir,et al.  Application-aware prefetch prioritization in on-chip networks , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[24]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[25]  Leibo Liu,et al.  Acceleration of control flows on Reconfigurable Architecture with a composite method , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[26]  Fadi J. Kurdahi,et al.  A framework for reconfigurable computing: task scheduling and context management , 2001, IEEE Trans. Very Large Scale Integr. Syst..