PADS: Performance-Aware Dynamic Scheduling for Effective MapReduce Computation in Heterogeneous Clusters

A lot of previous works on Map-Reduce improved job completion performance through implementing additional instrumentation modules which collects system level information for making scheduling decisions. However the extra instrumentation may not scale well with increasing number of task-trackers. To this end, we design PADS, a lightweight scheduler which uses time prediction to schedule tasks without additional instrumentation modules. Results shows PADS improves performance by 6%, 12%, and 9% as compared to ESAMR, LA, and DDAS respectively.