Cost-effective data analytics across multiple cloud regions

We propose a cloud-native data analytics engine for processing data stored among geographically distributed cloud regions with reduced cost. A job is split into subtasks and placed across regions based on factors including prices of compute resources and data transmission. We present its architecture which leverages existing cloud infrastructures and discuss major challenges of its system design. Preliminary experiments show that the cost is reduced by 15.1% for a decision support query on a four-region public cloud setup.

[1]  Kyungyong Lee,et al.  DeepSpotCloud: Leveraging Cross-Region GPU Spot Instances for Deep Learning , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[2]  Anshul Gandhi,et al.  Analyzing the Network for AWS Distributed Cloud Computing , 2015, PERV.

[3]  Gul Agha,et al.  Costless: Optimizing Cost of Serverless Computing through Function Fusion and Placement , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[4]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[6]  Aditya Akella,et al.  CLARINET: WAN-Aware Optimization for Analytics Queries , 2016, OSDI.

[7]  Rajkumar Buyya,et al.  Minimizing Execution Costs when Using Globally Distributed Cloud Services , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[8]  Ion Stoica,et al.  Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure , 2019, NSDI.