Query and Resource Optimization: Bridging the Gap

Modern big data systems run on cloud environments where resources are shared among several users and applications. As a result, declarative user queries need to be optimized and executed over resources that constantly change and are provisioned on demand for each job. This requires us to rethink traditional query optimization designed for systems that run on dedicated resources. In this paper, we show evidence that the choice of query plans depends heavily on the resources that the plan will be executed on. The current practice of determining query plans without accounting for resources could lead to significant performance loss in popular big data systems, such as Hive and SparkSQL. Therefore, we make a case for Query and Resource Optimization (or QROP), i.e., choosing both the query plan and the resource configuration at the same time, and present a research agenda towards this direction.