Implementing malleability on MPI jobs

Parallel jobs are characterized for having processes that communicate and synchronize with each other frequently. A processor allocation strategy widely used in parallel supercomputers is space-sharing, that is assigning a processors partition to each job for its exclusive use. We present a global solution to offer virtual malleability on message-passing parallel jobs, by applying a processor allocation strategy, the Folding by JobType (FJT). This technique is based on folding and moldability concepts and tries to decide the optimal initial number of processes, when to fold jobs and the number of folding times by analyzing the current and past system information. At processor level, we apply co-scheduling. We implement and evaluate the FJT under several workloads with different job sizes, classes and machine utilization. Results show that the FJT adapts easily to load changes, and can obtain better performance than the rest evaluated, on workloads with high coefficient variation and especially with burst arrivals.