Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters

Pervasive needs for data explorations at all scales have populated modern distributed platforms with workloads of different characteristics. The growing complexities and diversities have thereafter imposed distinct challenges to execute them on shared clusters in corporate or public clouds. This paper presents Fangorn, an adaptive execution framework built on an enriched graph model. As the underlying infrastructure for core computation platforms at Alibaba, Fangorn supports various execution modes and caters to heterogeneous workloads. With the capability to orchestrate graph executions with both long-running and requested-on-demand resources at the same time, Fangorn allows exploration of tradeoffs between latency and resource efficiency, for jobs of all scales. By modeling distributed job executions as mutable graphs with pluggable components, Fangorn offers a systematic framework to adjust job executions adaptively, according to data statistics collected during run-time. Fangorn supports an array of different computation engines ranging from relational to deep learning, and is fully deployed on production clusters across Alibaba. It manages tens of millions of distributed jobs daily, with job size scaling from one to half-million. PVLDB Reference Format: Yingda Chen, Jiamang Wang, Yifeng Lu, Ying Han, Zhiqiang Lv, Xuebin Min, Hua Cai, Wei Zhang, Haochuan Fan, Chao Li, Tao Guan, Wei Lin, Yangqing Jia and Jingren Zhou. Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters. PVLDB, 14(12): 2972 -

[1]  Chao Li,et al.  Fuxi: a Fault-Tolerant Resource Management and Job Scheduling System at Internet Scale , 2014, Proc. VLDB Endow..

[2]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[3]  Wencong Xiao,et al.  AntMan: Dynamic Scaling on GPU Clusters for Deep Learning , 2020, OSDI.

[4]  Nicolas Bruno,et al.  Advanced Join Strategies for Large-Scale Distributed Computation , 2014, Proc. VLDB Endow..

[5]  Jingren Zhou,et al.  Incorporating partitioning and parallel plans into the SCOPE optimizer , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[6]  Xuedong Chen,et al.  The Star Schema Benchmark and Augmented Fact Table Indexing , 2009, TPCTC.

[7]  Wei Lin,et al.  Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.

[8]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[9]  Hai Jin,et al.  LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Nicolas Bruno,et al.  Continuous Cloud-Scale Query Optimization and Processing , 2013, Proc. VLDB Endow..

[12]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[13]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[14]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[15]  Xiaoyu Chen,et al.  JetScope: Reliable and Interactive Analytics at Cloud Scale , 2015, Proc. VLDB Endow..

[16]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[17]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[18]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[19]  James Cheng,et al.  Yugong: Geo-Distributed Data and Job Placement at Scale , 2019, Proc. VLDB Endow..

[20]  Dominic Battré,et al.  Nephele/PACTs: a programming model and execution framework for web-scale analytical processing , 2010, SoCC '10.

[21]  David Phillips,et al.  Presto: SQL on Everything , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[22]  Viktor Leis,et al.  Query optimization through the looking glass, and what we found running the Join Order Benchmark , 2017, The VLDB Journal.

[23]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[24]  Magdalena Balazinska,et al.  Skew-resistant parallel processing of feature-extracting scientific user-defined functions , 2010, SoCC '10.

[25]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[26]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[27]  Magdalena Balazinska,et al.  SkewTune: mitigating skew in mapreduce applications , 2012, SIGMOD Conference.

[28]  Meikel Pöss,et al.  TPC-DS, taking decision support benchmarking to the next level , 2002, SIGMOD '02.

[29]  Ashish Motivala,et al.  Building An Elastic Query Engine on Disaggregated Storage , 2020, NSDI.

[30]  Liang Chen,et al.  Handling data skew in parallel joins in shared-nothing systems , 2008, SIGMOD Conference.

[31]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[32]  Martin Grund,et al.  Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.

[33]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[34]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[35]  Carlo Curino,et al.  Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications , 2015, SIGMOD Conference.

[36]  Wei Li,et al.  Skew handling techniques in sort-merge join , 2002, SIGMOD '02.

[37]  Nikhil R. Devanur,et al.  Bubble Execution: Resource-aware Reliable Analytics at Cloud Scale , 2018, Proc. VLDB Endow..

[38]  Yuan Yuan,et al.  Major technical advancements in apache hive , 2014, SIGMOD Conference.