FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing

Data-parallel computing frameworks (DCF) such as MapReduce, Spark, and Dryad etc. Have tremendous applications in big data and cloud computing, and throw tons of flows into data center networks. In this paper, we design and implement FLOWPROPHET, a general framework to predict traffic flows for DCFs. To this end, we analyze and summarize the common features of popular DCFs, and gain a key insight: since application logic in DCFs is naturally expressed by directed acyclic graphs (DAG), DAG contains necessary time and data dependencies for accurate flow prediction. Based on the insight, FLOWPROPHET extracts DAGs from user applications, and uses the time and data dependencies to calculate flow information 4-tuple, (source, destination, flow_size, establish_time), ahead-of-time for all flows. We also provide generic programming interface to FLOWPROPHET, so that current and future DCFs can deploy FLOWPROPHET readily. We implement FLOWPROPHET on both Spark and Hadoop, and perform extensive evaluations on a testbed with 37 physical servers. Our implementation and experiments demonstrate that, with time in advance and minimal cost, FLOWPROPHET can achieve almost 100% accuracy in source, destination, and flow size predictions. With accurate prediction from FLOWPROPHET, the job completion time of a Hadoop TeraSort benchmark is reduced by 12.52% on our cluster with a simple network scheduler.

[1]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[2]  Alex X. Liu,et al.  Friends, not foes: synthesizing existing transport strategies for data center networks , 2015, SIGCOMM 2015.

[3]  B. Welford Note on a Method for Calculating Corrected Sums of Squares and Products , 1962 .

[4]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[5]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[6]  Amin Vahdat,et al.  Switching the optical divide: fundamental challenges for hybrid electrical/optical datacenter networks , 2011, SoCC.

[7]  Ankit Singla,et al.  OSA: An Optical Switching Architecture for Data Center Networks With Unprecedented Flexibility , 2012, IEEE/ACM Transactions on Networking.

[8]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.

[9]  David L. Mills,et al.  Network Time Protocol (NTP) , 1985, RFC.

[10]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[11]  Konstantina Papagiannaki,et al.  c-Through: part-time optics in data centers , 2010, SIGCOMM 2010.

[12]  Arvind Krishnamurthy,et al.  Proceedings of the 2014 ACM conference on SIGCOMM , 2014, SIGCOMM 2014.

[13]  Grant Ingersoll,et al.  Introducing Apache Mahout Scalable , commercial-friendly machine learning for building intelligent applications , 2017 .

[14]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[15]  Alekh Jindal,et al.  Hadoop++ , 2010 .

[16]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[17]  Ion Stoica,et al.  Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2015, SIGCOMM.

[20]  Mikkel Thorup,et al.  Traffic engineering with estimated traffic matrices , 2003, IMC '03.

[21]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[22]  Christo Wilson,et al.  Better never than late , 2011, SIGCOMM 2011.

[23]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[24]  Anja Feldmann,et al.  A methodology for estimating interdomain web traffic demand , 2004, IMC '04.

[25]  Amin Vahdat,et al.  Helios: a hybrid electrical/optical switch architecture for modular data centers , 2010, SIGCOMM '10.

[26]  Zhiqiang Ma,et al.  HadoopWatch: A first step towards comprehensive traffic forecasting in cloud computing , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[27]  Steven Hand,et al.  Scripting the Cloud with Skywriting , 2010, HotCloud.

[28]  Haitao Wu,et al.  Towards minimal-delay deadline-driven data center TCP , 2013, HotNets.

[29]  J. Stuart Hunter,et al.  The exponentially weighted moving average , 1986 .

[30]  Amin Vahdat,et al.  Integrating microsecond circuit switching into the data center , 2013, SIGCOMM.

[31]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[32]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[33]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.