Time Series Analysis for Efficient Sample Transfers

Real-time transfer optimization approaches offer promising solutions as they can discover optimal transfer configuration in the runtime without requiring an upfront work or making assumptions about underlying system architectures. On the other hand, existing implementations suffer from slow convergence speed due to running many sample transfers with suboptimal configurations. In this work, we evaluate time-series models to minimize the impact of sample transfers with suboptimal configurations by shortening the transfer duration without degrading the accuracy. The results gathered in various networks with rich set of transfer configurations indicate that, in most cases, Autoregressive model can accurately estimate sample transfer throughput in less than 5 seconds which is up-to 4x improvement over the state-of-the-art solution. We also realized that while the most common transfer applications report transfer throughput at most once a second, decreasing the reporting interval is the key to further reduce the impact of sample transfers by quickly determining their performance.

[1]  Tevfik Kosar,et al.  Network-aware end-to-end data throughput optimization , 2011, NDM '11.

[2]  Rajkumar Buyya,et al.  Scheduling Workflow Applications Based on Multi-source Parallel Data Retrieval in Distributed Computing Networks , 2012, Comput. J..

[3]  Ian T. Foster,et al.  Toward a smart data transfer node , 2018, Future Gener. Comput. Syst..

[4]  P. Sadayappan,et al.  Modeling and Optimizing Large-Scale Wide-Area Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[5]  Chase Qishi Wu,et al.  Data Transfer Advisor with Transport Profiling Optimization , 2017, 2017 IEEE 42nd Conference on Local Computer Networks (LCN).

[6]  Tevfik Kosar,et al.  Hysteresis-based optimization of data transfer throughput , 2015, NDM '15.

[7]  James J. Hack,et al.  Response of Climate Simulation to a New Convective Parameterization in the National Center for Atmospheric Research Community Climate Model (CCM3) , 1998 .

[8]  Ian T. Foster,et al.  Software as a service for data scientists , 2012, Commun. ACM.

[9]  Moncef Gabbouj,et al.  Rate adaptation for adaptive HTTP streaming , 2011, MMSys.

[10]  Brian D. Noble,et al.  Adaptive data block scheduling for parallel TCP streams , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[11]  Tevfik Kosar,et al.  Energy-aware data transfer algorithms , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Tevfik Kosar,et al.  Modeling throughput sampling size for a cloud-hosted data scheduling and optimization service , 2013, Future Gener. Comput. Syst..

[13]  Tevfik Kosar,et al.  HARP: Predictive Transfer Optimization Based on Historical Analysis and Real-Time Probing , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Tevfik Kosar,et al.  Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency , 2016, IEEE Transactions on Cloud Computing.

[15]  Mehmet Balman,et al.  A new paradigm: Data-aware scheduling in grid computing , 2009, Future Gener. Comput. Syst..

[16]  Robert J. Nicholls,et al.  Resilience to natural hazards: How useful is this concept? , 2003 .

[17]  Tevfik Kosar,et al.  Big data transfer optimization through adaptive parameter tuning , 2018, J. Parallel Distributed Comput..

[18]  Galen M. Shipman,et al.  LADS: Optimizing Data Transfers Using Layout-Aware Data Scheduling , 2015, FAST.

[19]  Tevfik Kosar,et al.  Prediction of Optimal Parallelism Level in Wide Area Data Transfers , 2011, IEEE Transactions on Parallel and Distributed Systems.

[20]  Hiroyuki Ohsaki,et al.  On Parameter Tuning of Data Transfer Protocol GridFTP for Wide-Area Networks , 2008 .

[21]  Chase Qishi Wu,et al.  Experimental Analysis of File Transfer Rates over Wide-Area Dedicated Connections , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[22]  Ned Freed,et al.  SMTP Service Extension for Command Pipelining , 1997, RFC.

[23]  Tevfik Kosar,et al.  Application Level High Speed Transfer Optimization Based on Historical Analysis and Real-time Tuning , 2017, ArXiv.

[24]  Prasanna Balaprakash,et al.  Improving Data Transfer Throughput with Direct Search Optimization , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[25]  Tevfik Kosar,et al.  Dynamic Protocol Tuning Algorithms for High Performance Data Transfers , 2013, Euro-Par.

[26]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[27]  D. Martin Swany,et al.  PerfSONAR: A Service Oriented Architecture for Multi-domain Network Monitoring , 2005, ICSOC.