Softening Up the Network for Scientific Applications

Scientific applications demand huge computational power connected through fast networks. They are developed using parallel kernel methods, usually implemented with the Message Passing Interface (MPI), presenting well-behaved communication patterns across computing nodes. The current network technologies do not allow defining traffic forwarding policies considering the different application traffic, resulting in an unbalanced load on the network links. Moreover, the devices are not concerned if the traffic is latency-sensitive or bandwidth-intensive. To handle this, we present NetSA, a framework exploiting the communication patterns of scientific applications, considering latency and bandwidth constraints, as the key logic for evenly placing the application flows on the network available paths. Through NetSA, the scientific application developer can easily modify the network behavior to best fit the application communication requirements. We have performed experiments for optimizing the MPI communication primitives and applied our solution to speed up scientific applications, obtaining an execution time reduction up to 27%.

[1]  Amnon Barak,et al.  MAPS , 2014, ACM Trans. Archit. Code Optim..

[2]  Wahid Nasri,et al.  A Performance Prediction Approach for MPI Routines on Multi-clusters , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[3]  Bryan W. Karney,et al.  Numerical modelling of flow and transport in rough fractures , 2014 .

[4]  Kohei Ichikawa,et al.  Efficacy Analysis of a SDN-enhanced Resource Management System through NAS Parallel Benchmarks , 2014, Rev. Socionetwork Strateg..

[5]  R. Rabenseifner,et al.  Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512 , 2004 .

[6]  Magnos Martinello,et al.  A Survey on SDN Programming Languages: Toward a Taxonomy , 2016, IEEE Communications Surveys & Tutorials.

[7]  Fan Yao,et al.  A comparative analysis of data center network architectures , 2014, 2014 IEEE International Conference on Communications (ICC).

[8]  Gagan Agrawal,et al.  A Pattern Specification and Optimizations Framework for Accelerating Scientific Computations on Heterogeneous Clusters , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[9]  Keichi Takahashi,et al.  Concept and Design of SDN-Enhanced MPI Framework , 2015, 2015 Fourth European Workshop on Software Defined Networks.

[10]  Magnos Martinello,et al.  Carving Software-Defined Networks for Scientific Applications with SpateN , 2016, 2016 IEEE 41st Conference on Local Computer Networks (LCN).

[11]  Kohei Ichikawa,et al.  Application-Oriented Bandwidth and Latency Aware Routing with Open Flow Network , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[12]  Praveen Yalagandula,et al.  Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection , 2011, 2011 Proceedings IEEE INFOCOM.

[13]  Michael Frumkin,et al.  The OpenMP Implementation of NAS Parallel Benchmarks and its Performance , 2013 .

[14]  Luis Carlos Erpen De Bona,et al.  Supporting Elasticity in OpenMP Applications , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[15]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[16]  Magnos Martinello,et al.  From software defined network to network defined for software , 2015, SAC.

[17]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[18]  Simon W. Moore,et al.  A communication characterisation of Splash-2 and Parsec , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[19]  Kohei Ichikawa,et al.  An Empirical Study of SDN-accelerated HPC Infrastructure for Scientific Research , 2015, 2015 International Conference on Cloud Computing Research and Innovation (ICCCRI).

[20]  Henri Casanova,et al.  Augmenting low-latency HPC network with free-space optical links , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[21]  Philippe Olivier Alexandre Navaux,et al.  Characterizing communication and page usage of parallel applications for thread and data mapping , 2015, Perform. Evaluation.

[22]  G. L. Vassoler,et al.  Hybrid reconfiguration for upgrading datacenter interconnection topology , 2012, IEEE Photonics Conference 2012.

[23]  Vincent Heuveline,et al.  THE OPENLB PROJECT: AN OPEN SOURCE AND OBJECT ORIENTED IMPLEMENTATION OF LATTICE BOLTZMANN METHODS , 2007 .

[24]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.