论文信息 - Reconfigurable Multi-core Architecture -- A Plausible Solution to the Von Neumann Performance Bottleneck

Reconfigurable Multi-core Architecture -- A Plausible Solution to the Von Neumann Performance Bottleneck

The ill-famed von Neumann bottleneck has been the main performance hurdle since the invention of computers. Although several techniques such as separate data/instruction caches, branch prediction, and parallel computing have been proposed and improved efficiency, the throughput bottleneck between CPU and memory is still very much there. We propose a novel reconfigurable multi-core architecture (RMA) to address this issue via the dynamic allocation of heterogeneous computing resources and distributed memory. We show how this is feasible with the state-of-the-art technologies of dynamic partial reconfiguration of hardware resources and runtime operating system configuration. Experiments and analysis show how RMA alleviates the performance bottleneck.

Jih-Sheng Shen | Pao-Ann Hsiung | Chun-Hsien Lu | Chih-Sheng Lin | Hung-Lin Chao

[1] Dong Li,et al. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures , 2012, CF '12.

[2] Chris Fallin,et al. Parallel application memory scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3] Pedro C. Diniz,et al. Compiler reuse analysis for the mapping of data in FPGAs with RAM blocks , 2004, Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921).

[4] Nikil D. Dutt,et al. On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems , 2000, TODE.

[5] Ann Gordon-Ross,et al. An application classification guided cache tuning heuristic for multi-core architectures , 2012, 17th Asia and South Pacific Design Automation Conference.

[6] Gang Wang,et al. Data Partitioning for Reconfigurable Architectures with Distributed Block RAM , 2005 .

[7] Maya Gokhale,et al. Automatic allocation of arrays to memories in FPGA processors with multiple memory banks , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[8] Nikil D. Dutt,et al. Automatic tuning of two-level caches to embedded applications , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[9] Pedro C. Diniz,et al. Automatic synthesis of data storage and control structures for FPGA-based computing engines , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[10] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).