Local memory store (LMStr): A hardware controlled shared scratchpad for multicores

We present an on-chip memory store called “Local Memory Store” (LMStr). The LMStr can be used with a regular cache hierarchy or solely as a redesigned scratchpad memory (SPM). The LMStr is a shared special kind of SPM among the cores in a multicore processor. The LMStr is hardware-controlled in terms of management of the store itself. Yet, compiler support is instrumental in deciding which data items/types should live in the store. Critical data should be stored in the LMStr according to its type (i.e. local, global, static, or temporary). The programmer can provide, at will, hints to the compiler to place certain data items in the LMStr. We evaluate our design using a matrix multiplication micro-application and multiple Mantevo mini-applications. Our results show that LMStr improves data movement by up to 21% compared to cache alone with a mere 3% area overhead. Not only that but LMStr improves the cycles per memory access by up to 40%.

[1]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[2]  Eduard Ayguadé,et al.  Hardware-software coherence protocol for the coexistence of caches and local memories , 2012, HiPC 2012.

[3]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[4]  Soonhoi Ha,et al.  ILP based data parallel multi-task mapping/scheduling technique for MPSoC , 2008, 2008 International SoC Design Conference.

[5]  Lin Gao,et al.  Memory coloring: a compiler approach for scratchpad memory management , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[6]  Abdel-Hameed A. Badawy,et al.  LMStr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[7]  Abdel-Hameed A. Badawy,et al.  Cache Utilization as a Locality Metric - A Case Study on the Mantevo Suite , 2016, 2016 International Conference on Computational Science and Computational Intelligence (CSCI).

[8]  Meikang Qiu,et al.  Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Gokcen Kestor,et al.  Quantifying the energy cost of data movement in scientific applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[10]  Eduard Ayguadé,et al.  Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[11]  Donald Yeung,et al.  Evaluating the impact of memory system performance on software prefetching and locality optimizations , 2001, ICS '01.

[12]  Aviral Shrivastava,et al.  Efficient Code Assignment Techniques for Local Memory on Software Managed Multicores , 2015, TECS.

[13]  Sarita V. Adve,et al.  Stash: Have your scratchpad and cache it too , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[14]  Wei Zhang,et al.  Scratchpad Memory Architectures and Allocation Algorithms for Hard Real-Time Multicore Processors , 2015, J. Comput. Sci. Eng..

[15]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[16]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[17]  Aviral Shrivastava,et al.  A Software-Only Solution to Use Scratch Pads for Stack Data , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Aviral Shrivastava,et al.  Automatic and efficient heap data management for Limited Local Memory multicore architectures , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Aviral Shrivastava,et al.  Heap data management for limited local memory (LLM) multi-core processors , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).