Astra: Autonomous Serverless Analytics with Cost-Efficiency and QoS-Awareness

With the ability to simplify the code deployment with one-click upload and lightweight execution, serverless computing has emerged as a promising paradigm with increasing popularity. However, there remain open challenges when adapting data-intensive analytics applications to the serverless context, in which users of serverless analytics encounter with the difficulty in coordinating computation across different stages and provisioning resources in a large configuration space. This paper presents our design and implementation of Astra, which configures and orchestrates serverless analytics jobs in an autonomous manner, while taking into account flexibly-specified user requirements. Astra relies on the modeling of performance and cost which characterizes the intricate interplay among multi-dimensional factors (e.g., function memory size, degree of parallelism at each stage). We formulate an optimization problem based on user-specific requirements towards performance enhancement or cost reduction, and develop a set of algorithms based on graph theory to obtain optimal job execution. We deploy Astra in the AWS Lambda platform and conduct real-world experiments over three representative benchmarks with different scales. Results demonstrate that Astra can achieve the optimal execution decision for serverless analytics, by improving the performance of 21% to 60% under a given budget constraint, and resulting in a cost reduction of 20% to 80% without violating performance requirement, when compared with three baseline configuration algorithms.