Scheduling of Algorithms Based on Elimination Trees on NUMA Systems

An important issue in the execution of programs on multiprocessor systems with non-uniform memory access times is data locality. Most of the dynamic scheduling algorithms deal well with load balance, but fail to take locality into account and, therefore, behave poorly on NUMA systems. In this paper we present a scheduling algorithm which has as its objective to increase data locality, and therefore performance, in problems based on elimination trees. We applied this scheduling to the modified Cholesky factorization as a case study. Experimental results on the SGI O2000 are shown.