RDMM: Runtime dynamic migration mechanism of distributed cache for reconfigurable array processor

Abstract Reconfigurable array processors have emerged as powerful solution to speed up computationally intensive applications. However, they may suffer from a data access bottleneck as the frequency of memory access rises. At present, the distributed cache design in the reconfigurable array processor has a large cache failure rate, and the frequent access to external memory leads to a long delay in memory access. To mitigate this problem, we present a Runtime Dynamically Migration Mechanism (RDMM) of distributed cache for reconfigurable array processor based on the feature of obvious locality and high parallelism in accessing data. This mechanism allows temporary, static data to be dynamically scheduled to migrate data with a high access frequency from the remote cache to the processor's local migration storage table based on how often the reconfigurable array processors access the remote cache. We can accurately get the data on the shortest path by way of data search strategy based on migration storage tables, thereby effectively reducing the access delay of the entire system, increasing the memory bandwidth of the reconfigurable array processor. We leverage the hardware platform of reconfigurable array processor to test the proposed mechanism. The experimental results show that RDMM reduces access delay by up to 35.24% compared with the tradition distributed cache at the highest conflict rate. And compared with the Ref.[19], Ref.[20], Ref.[21] and Ref.[23], the working frequency can be increased by 15%, the hit rate can be increased by 6.1%, and the peak bandwidth can be increased by about 3×.

[1]  Alexey Lastovetsky,et al.  A Novel Data-Partitioning Algorithm for Performance Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms , 2018, IEEE Transactions on Parallel and Distributed Systems.

[2]  Jiang Jiang,et al.  PSA-NUCA: A Pressure Self-Adapting Dynamic Non-uniform Cache Architecture , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[3]  Jianhua Li,et al.  Thread Criticality Assisted Replication and Migration for Chip Multiprocessor Caches , 2017, IEEE Transactions on Computers.

[4]  Dajiang Zhou,et al.  CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[5]  Chen Yang,et al.  HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing , 2018, IEEE Transactions on Circuits and Systems II: Express Briefs.

[6]  Henk Corporaal,et al.  Coarse grained reconfigurable architectures in the past 25 years: Overview and classification , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[7]  Leibo Liu,et al.  CIACP: A Correlation- and Iteration- Aware Cache Partitioning Mechanism to Improve Performance of Multiple Coarse-Grained Reconfigurable Arrays , 2017, IEEE Transactions on Parallel and Distributed Systems.

[8]  Sheng Ma,et al.  DyCache: Dynamic Multi-Grain Cache Management for Irregular Memory Accesses on GPU , 2018, IEEE Access.

[9]  Ryan N. Rakvic,et al.  Replacement techniques for dynamic NUCA cache designs on CMPs , 2013, The Journal of Supercomputing.

[10]  Yike Guo,et al.  A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration , 2016, IEEE Computer Architecture Letters.

[11]  Scott A. Mahlke,et al.  Polymorphic Pipeline Array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Leibo Liu,et al.  Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Luigi Carro,et al.  A reconfigurable heterogeneous multicore with a homogeneous ISA , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Lesley Shannon,et al.  Design Space Exploration of L1 Data Caches for FPGA-Based Multiprocessor Systems , 2015, FPGA.

[15]  Jürgen Teich,et al.  A reconfigurable memory architecture for system integration of coarse-grained reconfigurable arrays , 2017, 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[16]  Francisco J. Cazorla,et al.  Random Modulo: A new processor cache design for real-time critical systems , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Weiwei Shan,et al.  Configuration Cache Management for Coarse-Grained Reconfigurable Architecture with Multi-Array , 2012, 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[18]  Peipei Zhou A Fully Pipelined and Dynamically Composable Architecture of CGRA (Coarse Grained Reconfigurable Architecture) , 2014, FCCM 2014.

[19]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.