The performance gap between processor and main memory speed, called memory wall, is serious problem especially in High Performance Computing (HPC). This memory wall problem is addressed by two factors, large memory access latency and lack of memory throughput. There have been proposed many techniques for tolerating memory access latency, including cache prefetching. However, these techniques increase main memory traffic[1]. In order to overcome this problem, we have proposed a new processor architecture SCIMA (the abbreviation of Software Controlled Integrated Memory Architecture) which integrates software controllable memory (SCM) into a processor chip[2]. Our previous work revealed that SCIMA has the potential to tolerate memory latency without wasting memory bandwidth. Currently, we are working on the strategy how to control SCM by software. In this presentation, we present the optimization methodology and show its effectiveness.
[1]
Hiroshi Nakamura,et al.
Performance of lattice QCD programs on CP-PACS
,
1999,
Parallel Computing.
[2]
D. Burger,et al.
Memory Bandwidth Limitations of Future Microprocessors
,
1996,
23rd Annual International Symposium on Computer Architecture (ISCA'96).
[3]
James R. Goodman,et al.
Memory Bandwidth Limitations of Future Microprocessors
,
1996,
23rd Annual International Symposium on Computer Architecture (ISCA'96).
[4]
Hiroshi Nakamura,et al.
SCIMA: Software controlled integrated memory architecture for high performance computing
,
2000,
Proceedings 2000 International Conference on Computer Design.