Profit Maximization of Big Data Jobs in Cloud Using Stochastic Optimization

Reserved instances offered by cloud providers make it possible to reserve resources and computing capacity for a specific period of time. One should pay for all the hours of that time interval; in exchange, the hourly rate is significantly lower than on-demand instances. Reserved Instances can significantly reduce the monetary cost of resources needed to process big data applications in cloud. However, purchases of these instances are non-refundable, and hence, one should be able to estimate the required resources prior to purchase to avoid over-payment. It becomes important especially when the results obtained by big data job has monetary value, such as business intelligence applications. But, estimating the resource demand of big data processing jobs is hard because of numerous factors that affect them such as data locality, data skew, stragglers, internal settings of big data processing framework, interference among instances, instances availability, etc. To maximize the profit of processing such big data jobs in cloud considering fluctuating nature of their resource demand, as well as reserved instances limitations, we propose Reserved Instances Stochastic Allocation (RISA) approach. Using historical traces of resource demand of big data jobs submitted by user, RISA leverages stochastic optimization to determine the amount of resources needed to be reserved for that user to maximize the profit. Our evaluation using real-world traces shows that RISA can increase the net profit by up to 10x, compared to previous approaches. RISA can also find solutions as close as 2% to the best possible solution.