Boosting HPC Applications in the Cloud Through JIT Traffic-Aware Path Provisioning

Data centers, clusters and grids have historically supported High-Performance Computing (HPC) applications. Due to the high capital and operational expenditures associated with such infrastructures, in recent past, we have witnessed consistent efforts to run HPC applications in the cloud. The potential advantages of this shift include higher scalability and lower costs. If on the one hand, app instantiation – through customized Virtual Machines (VMs) – is a well-solved issue, on the other, the network still represents a significant bottleneck. When switching HPC applications to be executed on the cloud, we lose control of where VMs will be positioned and of the paths that will be traversed for processes to communicate with one another. To alleviate this problem, and taking advantage of new advances in programmable networks, we propose a mechanism for dynamic, just-in-time path provisioning in cloud infrastructures. It continuously monitors the network conditions and, given the current communication patterns of the application, systematically (re)programs paths to avoid uncongested links and reduce end-to-end delays. The proposed mechanism achieves a speedup of up to 44.24% regarding application runtime when compared to the traditional shortest-path, static approach.

[1]  Marius Hillenbrand,et al.  High performance cloud computing , 2013, Future Gener. Comput. Syst..

[2]  Chunlin Li,et al.  Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm , 2017, J. Netw. Comput. Appl..

[3]  Geppino Pucci,et al.  Universality in VLSI Computation , 2011, ParCo 2011.

[4]  Philippe Olivier Alexandre Navaux,et al.  Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications , 2017, Euro-Par.

[5]  Amiya Nayak,et al.  Improving flow completion time for short flows in datacenter networks , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[6]  Rajkumar Buyya,et al.  HPC Cloud for Scientific and Business Applications , 2017, ACM Comput. Surv..

[7]  Edward Walker,et al.  Benchmarking Amazon EC2 for High-Performance Scientific Computing , 2008, login Usenix Mag..

[8]  Enhancing InfiniBand with OpenFlow-Style SDN Capability , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Dejan S. Milojicic,et al.  OpenNebula: A Cloud Management Tool , 2011, IEEE Internet Computing.

[10]  Nick McKeown,et al.  A network in a laptop: rapid prototyping for software-defined networks , 2010, Hotnets-IX.

[11]  Mohsine Eleuldj,et al.  OpenStack: Toward an Open-source Solution for Cloud Computing , 2012 .

[12]  Lavanya Ramakrishnan,et al.  Evaluating Interconnect and Virtualization Performance forHigh Performance Computing , 2011, PERV.

[13]  Dianxiang Xu,et al.  Network Parallelization in HPC Clusters , 2016, 2016 International Conference on Computational Science and Computational Intelligence (CSCI).

[14]  Dejan S. Milojicic,et al.  Evaluating and Improving the Performance and Scheduling of HPC Applications in Cloud , 2016, IEEE Transactions on Cloud Computing.

[15]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[16]  Zhou Tong,et al.  A comparative study of SDN and adaptive routing on dragonfly networks , 2017, SC.

[17]  Andry Rakotonirainy,et al.  A Critical Review of Proactive Detection of Driver Stress Levels Based on Multimodal Measurements , 2018, ACM Comput. Surv..

[18]  H. Jonathan Chao,et al.  STAR: Preventing flow-table overflow in software-defined networks , 2017, Comput. Networks.