Toward a smart data transfer node

Abstract Scientific computing systems are becoming significantly more complex, with distributed teams and complex workflows spanning resources from telescopes and light sources to fast networks and Internet of Things sensor systems. In such settings, no single, centralized administrative team and software stack can coordinate and manage all resources used by a single application. Indeed, we have reached a critical limit in manageability using current human-in-the-loop techniques. We therefore argue that resources must begin to respond automatically, adapting and tuning their behavior in response to observed properties of scientific workflows. Over time, machine learning methods can be used to identify effective strategies for autonomic, goal-driven management behaviors that can be applied end-to-end across the scientific computing landscape. Using the data transfer nodes that are widely deployed in modern research networks as an example, we explore the architecture, methods, and algorithms needed for a smart data transfer node to support future scientific computing systems that self-tune and self-manage.

[1]  L. Zhen,et al.  AutoMate: Enabling Autonomic Applications on the Grid , 2003, 2003 Autonomic Computing Workshop.

[2]  Tevfik Kosar,et al.  A Heuristic Approach to Protocol Tuning for High Performance Data Transfers , 2017, ArXiv.

[3]  Tevfik Kosar,et al.  HARP: Predictive Transfer Optimization Based on Historical Analysis and Real-Time Probing , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  M. Salehie,et al.  Autonomic computing , 2005, ACM SIGSOFT Softw. Eng. Notes.

[5]  Ian T. Foster,et al.  Cross-geography scientific data transferring trends and behavior , 2018, HPDC.

[6]  Tevfik Kosar,et al.  A Highly-Accurate and Low-Overhead Prediction Model for Transfer Throughput Optimization , 2012, SC Companion.

[7]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8]  Eli Dart,et al.  The Science DMZ: A network design pattern for data-intensive science , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Kesheng Wu,et al.  Towards Autonomic Science Infrastructure: Architecture, Limitations, and Open Issues , 2018, AI-Science@HPDC.

[10]  Manish Parashar,et al.  CometCloud: Enabling Software-Defined Federations for End-to-End Application Workflows , 2015, IEEE Internet Computing.

[11]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[12]  Rushil Anirudh,et al.  Performance Modeling under Resource Constraints Using Deep Transfer Learning , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Ian T. Foster,et al.  Efficient and Secure Transfer, Synchronization, and Sharing of Big Data , 2014, IEEE Cloud Computing.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Prasanna Balaprakash,et al.  Explaining Wide Area Data Transfer Performance , 2017, HPDC.

[16]  Franck Cappello,et al.  Transferring a petabyte in a day , 2018, Future Gener. Comput. Syst..

[17]  Ian T. Foster,et al.  A Comprehensive Study of Wide Area Data Movement at a Scientific Computing Facility , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[18]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[19]  W. Weibull A Statistical Distribution Function of Wide Applicability , 1951 .

[20]  Julie A. McCann,et al.  A survey of autonomic computing—degrees, models, and applications , 2008, CSUR.

[21]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[22]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[23]  Robert L. Grossman,et al.  UDT: UDP-based data transfer for high-speed wide area networks , 2007, Comput. Networks.

[24]  Ian T. Foster,et al.  A data transfer framework for large-scale science experiments , 2010, HPDC '10.