Some applications with gather / scatter operations are difficult to accelerate. These operations cause inefficient cache use in each processor and fine grain global communications in parallel systems. There are several applications with such characteristics particularly in electrical engineering. For examples, circuit simulation and power flow simulation with LU decomposition for random sparse matrix has such characteristics. This paper presents how to make inexpensive personal supercomputers to solve these problems. In order to get the merit of commercial-off-the-shelf (COTS) continuously after the death of vector supercomputer vendors, it is designed without any modification on CPU, bridge chips on motherboard and memory chips. Only plugging a new memory module with vector load / store function and communication functions make an inexpensive home-use personal computer into a node similar to Earth simulator's one. Applications with unit striding or indexed accesses are going to be accelerated. How to accelerate NAS CG is shown as an example.
[1]
Erik Brunvand,et al.
Impulse: building a smarter memory controller
,
1999,
Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[2]
Noboru Tanabe,et al.
MEMOnet: network interface plugged into a memory slot
,
2000,
Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.
[3]
Hideharu Amano,et al.
Low Latency High Bandwidth Message Transfer Mechanisms for a Network Interface Plugged into a Memory Slot
,
2004,
Cluster Computing.
[4]
Leonid Oliker,et al.
Memory-intensive benchmarks: IRAM vs. cache-based machines
,
2002,
Proceedings 16th International Parallel and Distributed Processing Symposium.
[5]
Hector Garcia-Molina,et al.
Main Memory Database Systems: An Overview
,
1992,
IEEE Trans. Knowl. Data Eng..
[6]
Mitsuo Yokokawa,et al.
Basic Design of the Earth Simulator
,
1999,
ISHPC.