Lessons learned from porting vector computer applications onto non-uniform memory access scalar machines

Although recent large-scale scalar multiprocessor systems have good potential to overwhelm vector machines even in vector-specific application areas, the applicability has not been systematically studied. We ported 2 typical vector applications onto 2 different scalar NUMA platforms. We found that trivial array dimension reordering drastically affect performance. We also show that vector-specific programming methods could hinder scalar/NUMA system's performance. A general workaround we developed is described along with a discussion for platforms' memory systems characteristics.