SLO-driven right-sizing and resource provisioning of MapReduce jobs

There is an increasing number of MapReduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy- sis tools available to system administrators for automated performance management of such MapReduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of MapReduce jobs. First, we pro- pose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger data- set. The job profile (with scaling factors) forms the basis of a MapReduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a MapReduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications with different timing requirements on the 66-node Hadoop cluster.