Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring

Stealing network bandwidth helps a variety of HPC runtimes and services to run additional operations in the background without negatively affecting the applications. A key ingredient to make this possible is an accurate prediction of the future network utilization, enabling the runtime to plan the background operations in advance, such as to avoid competing with the application for network bandwidth. In this paper, we propose a portable deep learning predictor that only uses the information available through MPI introspection to construct a recurrent sequence-to-sequence neural network capable of forecasting network utilization. We leverage the fact that most HPC applications exhibit periodic behaviors to enable predictions far into the future (at least the length of a period). Our online approach does not have an initial training phase, it continuously improves itself during application execution without incurring significant computational overhead. Experimental results show better accuracy and lower computational overhead compared with the state-of-the-art on two representative applications.

[1]  Daniel A. Reed,et al.  ARIMA time series modeling and forecasting for adaptive I/O prefetching , 2001, ICS '01.

[2]  Robert D. Falgout,et al.  Multigrid Smoothers for Ultraparallel Computing , 2011, SIAM J. Sci. Comput..

[3]  George Bosilca,et al.  Analysis of the Component Architecture Overhead in Open MPI , 2005, PVM/MPI.

[4]  Vitaly Kuznetsov,et al.  Foundations of Sequence-to-Sequence Modeling for Time Series , 2018, AISTATS.

[5]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[6]  Somnath Mazumdar,et al.  Forecasting HPC Workload Using ARMA Models and SSA , 2016, 2016 International Conference on Information Technology (ICIT).

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  George Bosilca,et al.  Implementation and Usage of the PERUSE-Interface in Open MPI , 2006, PVM/MPI.

[9]  Diego Klabjan,et al.  Dynamic prediction length for time series with sequence to sequence network , 2018, ICAIF.

[10]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Robert D. Falgout,et al.  Multigrid Smoothers for Ultra-Parallel Computing , 2011 .

[12]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Dirk Schmidl,et al.  Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.

[14]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  George Bosilca,et al.  Online Dynamic Monitoring of MPI Communications , 2017, Euro-Par.

[18]  Jeffrey S. Vetter,et al.  Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.

[19]  Daniel A. Reed,et al.  Automatic ARIMA time series modeling for adaptive I/O prefetching , 2004, IEEE Transactions on Parallel and Distributed Systems.

[20]  Satoshi Matsuoka,et al.  Tracing Data Movements within MPI Collectives , 2014, EuroMPI/ASIA.

[21]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[22]  Ali Pinar,et al.  A Simulator for Large-Scale Parallel Computer Architectures , 2010, Int. J. Distributed Syst. Technol..

[23]  Hal Finkel,et al.  HACC , 2016, Commun. ACM.

[24]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25]  Eli Dart,et al.  Crosscut report: Exascale Requirements Reviews, March 9–10, 2017 – Tysons Corner, Virginia. An Office of Science review sponsored by: Advanced Scientific Computing Research, Basic Energy Sciences, Biological and Environmental Research, Fusion Energy Sciences, High Energy Physics, Nuclear Physics , 2018 .

[26]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.