Software Controlled Reconfigurable On-Chip Memory for High Performance Computing

The performance gap between processor and memory is very serious problem in high performance computing because effective performance is limited by memory ability. In order to overcome this problem, we propose a new VLSI architecture called SCIMA which integrates software controllable memory into a processor chip in addition to ordinary data cache. Most of data access is regular in high performance computing. Software controllable memory is better at making good use of the regularity than conventional cache. This paper presents its architecture and performance evaluation. In SCIMA, the ratio of software controllable memory and cache can be dynamically changed. Due to this feature, SCIMA is upper compatible with conventional memory architecture. Performance is evaluated by using CG and FT kernels of NPB Benchmark and a real application of QCD (Quantum ChromoDynamics). The evaluation results reveal that SCIMA is superior to conventional cache-based architecture. It is also revealed that the superiority of SCIMA increases when access latency of off-chip memory increases or its relative throughput gets lower.

[1]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[2]  Hiroshi Nakamura,et al.  Performance of lattice QCD programs on CP-PACS , 1999, Parallel Computing.

[3]  Allan Porterfield,et al.  Data cache performance of supercomputer applications , 1990, Proceedings SUPERCOMPUTING '90.

[4]  Hiroshi Nakamura,et al.  SCIMA: a novel processor architecture for high performance computing , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[5]  Sanjive Agarwala,et al.  A multi-level memory system architecture for high performance DSP applications , 2000, Proceedings 2000 International Conference on Computer Design.

[6]  Keith D. Cooper,et al.  Compiler-controlled memory , 1998, ASPLOS VIII.

[7]  Srinivas Devadas,et al.  Application-specific memory management for embedded systems using software-controlled caches , 2000, Proceedings 37th Design Automation Conference.

[8]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[9]  Hiroshi Nakamura,et al.  SCIMA: Software controlled integrated memory architecture for high performance computing , 2000, Proceedings 2000 International Conference on Computer Design.

[10]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[11]  Hiroshi Nakamura,et al.  Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.

[12]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).