Harnessing Data Movement in Virtual Clusters for In-Situ Execution

As a result of increasing data volume and velocity, Big Data science at exascale has shifted towards the in-situ paradigm, where large scale simulations run concurrently alongside data analytics. With in-situ, data generated from simulations can be processed while still in memory, thereby avoiding the slow storage bottleneck. However, running simulations and analytics together on shared resources will likely result in substantial contention if left unmanaged, as demonstrated in this work, leading to much reduced efficiency of simulations and analytics. Recently, virtualization technologies such as Linux containers have been widely applied to data centers and physical clusters to provide highly efficient and elastic resource provisioning for consolidated workloads including scientific simulations and data analytics. In this paper, we investigate to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications in virtual clusters. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Although this can be an effective technique, the naïve usage of network virtualization can lead to performance degradation for bursty asynchronous transmissions within an MPI job. We analyze and resolve this performance degradation in virtual clusters.

[1]  K. K. Ramakrishnan,et al.  NetVM: High Performance and Flexible Networking Using Virtualization on Commodity Platforms , 2014, IEEE Transactions on Network and Service Management.

[2]  Matthieu Dreher,et al.  Bredala: Semantic Data Redistribution for In Situ Applications , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[3]  Peter A. Dinda,et al.  Optimizing overlay-based virtual networking through optimistic interrupts and cut-through forwarding , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Alan Wagner,et al.  Added Concurrency to Improve MPI Performance on Multicore , 2012, 2012 41st International Conference on Parallel Processing.

[5]  Purushotham Bangalore,et al.  Managing I/O Interference in a Shared Burst Buffer System , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[6]  R. Shreedhar,et al.  Efficient Fair Queuing Using Deficit Round - , 1997 .

[7]  Sabela Ramos,et al.  Performance analysis of HPC applications in the cloud , 2013, Future Gener. Comput. Syst..

[8]  Wenguang Chen,et al.  ACIC: Automatic cloud I/O configurator for HPC applications , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Jiaqi Liu,et al.  Supporting Fault-Tolerance in Presence of In-Situ Analytics , 2017, CCGrid.

[10]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[11]  Alan Pankratz,et al.  Forecasting with univariate Box-Jenkins models : concepts and cases , 1983 .

[12]  Surendra Byna,et al.  Dynamic Model-Driven Parallel I/O Performance Tuning , 2015, 2015 IEEE International Conference on Cluster Computing.

[13]  Wenguang Chen,et al.  Cost-effective cloud HPC resource provisioning by building Semi-Elastic virtual clusters , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Ning Ding,et al.  The only constant is change: incorporating time-varying network reservations in data centers , 2012, SIGCOMM.

[15]  Samuel Williams,et al.  Kinetic turbulence simulations at extreme scale on leadership-class systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[16]  Martín Casado,et al.  Network Virtualization in Multi-tenant Datacenters , 2014, NSDI.

[17]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[18]  Wang Teng,et al.  An Ephemeral Burst-Buffer File System for Scientific Applications , 2016 .

[19]  Dynamics of a strongly driven two-component Bose-Einstein condensate , 2001, cond-mat/0111573.

[20]  Scott Pakin,et al.  Exploring power behaviors and trade-offs of in-situ data analytics , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[21]  Karsten Schwan,et al.  PreDatA – preparatory data analytics on peta-scale machines , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[22]  Robert Sisneros,et al.  Adaptive Performance-Constrained In Situ Visualization of Atmospheric Simulations , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[23]  Stefan Schmid,et al.  Kraken: Online and elastic resource reservations for multi-tenant datacenters , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[24]  Robert Latham,et al.  Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data , 2011, Euro-Par.

[25]  Surendra Byna,et al.  Improving parallel I/O autotuning with performance modeling , 2014, HPDC '14.

[26]  Karsten Schwan,et al.  GoldRush: Resource efficient in situ scientific data analytics using fine-grained interference aware execution , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Mary Lou Soffa,et al.  Contention aware execution: online contention detection and response , 2010, CGO '10.

[28]  César A. F. De Rose,et al.  Performance Evaluation of Container-Based Virtualization for High Performance Computing Environments , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[29]  Harrick M. Vin,et al.  Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks , 1996, SIGCOMM 1996.

[30]  Tor Skeie,et al.  Towards the InfiniBand SR-IOV vSwitch Architecture , 2015, 2015 IEEE International Conference on Cluster Computing.

[31]  Alan L. Cox,et al.  Hyper-Switch: A Scalable Software Virtual Switching Architecture , 2013, USENIX Annual Technical Conference.

[32]  Peter A. Dinda,et al.  Virtual TCP offload: optimizing ethernet overlay performance on advanced interconnects , 2013, HPDC '13.

[33]  Marius Hillenbrand,et al.  High performance cloud computing , 2013, Future Gener. Comput. Syst..

[34]  Ying Zhang,et al.  Providing bandwidth guarantees, work conservation and low latency simultaneously in the cloud , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[35]  Peter A. Dinda,et al.  Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores , 2012, HPDC '12.

[36]  Wei Wang,et al.  ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers , 2013, ASPLOS '13.

[37]  Ali Pinar,et al.  A Simulator for Large-Scale Parallel Computer Architectures , 2010, Int. J. Distributed Syst. Technol..

[38]  Robert B. Ross,et al.  CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[39]  Robert B. Ross,et al.  On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[40]  Peter A. Dinda,et al.  VNET/P: bridging the cloud and high performance computing through fast overlay networking , 2012, HPDC '12.

[41]  George Varghese,et al.  Efficient fair queueing using deficit round-robin , 1996, TNET.

[42]  Yi Wang,et al.  Smart: a MapReduce-like framework for in-situ scientific analytics , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[43]  Scott Klasky,et al.  DataSpaces: an interaction and coordination framework for coupled simulation workflows , 2012, HPDC '10.

[44]  Larry L. Peterson,et al.  Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors , 2007, EuroSys '07.

[45]  Steven A. Hofmeyr,et al.  Oversubscription on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[46]  Brian Kocoloski,et al.  A case for dual stack virtualization: consolidating HPC and commodity applications in the cloud , 2012, SoCC '12.

[47]  Fan Zhang,et al.  Combining in-situ and in-transit processing to enable extreme-scale scientific analysis , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[48]  Valerio Pascucci,et al.  In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[49]  Fan Zhang,et al.  Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[50]  Robert B. Ross,et al.  Using Formal Grammars to Predict I/O Behaviors in HPC: The Omnisc'IO Approach , 2016, IEEE Transactions on Parallel and Distributed Systems.

[51]  Scott Klasky,et al.  Runtime I/O Re-Routing + Throttling on HPC Storage , 2013, HotStorage.

[52]  Tevfik Kosar,et al.  HARP: Predictive Transfer Optimization Based on Historical Analysis and Real-Time Probing , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[53]  Samuel Kounev,et al.  Evaluating and Modeling Virtualization Performance Overhead for Cloud Environments , 2011, CLOSER.

[54]  Min Li,et al.  GERBIL: MPI+YARN , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[55]  Dhabaleswar K. Panda,et al.  SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[56]  Helmut Lütkepohl,et al.  The role of the log transformation in forecasting economic variables , 2009, SSRN Electronic Journal.

[57]  Michael E. Papka,et al.  Toward simulation-time data analysis and I/O acceleration on leadership-class systems , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[58]  Rolf Hempel,et al.  The MPI Message Passing Interface Standard , 1994 .

[59]  Karsten Schwan,et al.  Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[60]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.