A data streaming model in MPI

Data streaming model is an effective way to tackle the challenge of data-intensive applications. As traditional HPC applications generate large volume of data and more data-intensive applications move to HPC infrastructures, it is necessary to investigate the feasibility of combining message-passing and streaming programming models. MPI, the de facto standard for programming on HPC, cannot intuitively express the communication pattern and the functional operations required in streaming models. In this work, we designed and implemented a data streaming library MPIStream atop MPI to allocate data producers and consumers, to stream data continuously or irregularly and to process data at run-time. In the same spirit as the STREAM benchmark, we developed a parallel stream benchmark to measure data processing rate. The performance of the library largely depends on the size of the stream element, the number of data producers and consumers and the computational intensity of processing one stream element. With 2,048 data producers and 2,048 data consumers in the parallel benchmark, MPIStream achieved 200 GB/s processing rate on a Blue Gene/Q supercomputer. We illustrate that a streaming library for HPC applications can effectively enable irregular parallel I/O, application monitoring and threshold collective operations.

[1]  Erwin Laure,et al.  The Cost of Synchronizing Imbalanced Processes in Message Passing Systems , 2015, 2015 IEEE International Conference on Cluster Computing.

[2]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[3]  Alexander S. Szalay,et al.  Data-Intensive Computing in the 21st Century , 2008, Computer.

[4]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[5]  Rajeev Thakur,et al.  MPI-Interoperable Generalized Active Messages , 2013, ICPADS 2013.

[6]  William Gropp,et al.  User's Guide for MPE: Extensions for MPI Programs , 1998 .

[7]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[8]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[9]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10]  Erwin Laure,et al.  Energetic particles in magnetotail reconnection , 2014, Journal of Plasma Physics.

[11]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[12]  Torsten Hoefler,et al.  Towards Efficient MapReduce Using MPI , 2009, PVM/MPI.

[13]  Robert H. Morris,et al.  Counting large numbers of events in small registers , 1978, CACM.

[14]  Erwin Laure,et al.  The Formation of a Magnetosphere with Implicit Particle-in-Cell Simulations , 2015, ICCS.

[15]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[16]  Zhiwei Xu,et al.  DataMPI: Extending MPI to Hadoop-Like Big Data Computing , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[17]  Torsten Hoefler,et al.  Active pebbles: parallel programming for data-driven applications , 2011, ICS '11.

[18]  Stefano Markidis,et al.  Multi-scale simulations of plasma with iPIC3D , 2010, Math. Comput. Simul..

[19]  Q. Koziol,et al.  Tuning Parallel I/O on Blue Waters for Writing 10 Trillion Particles , 2015 .

[20]  Dhabaleswar K. Panda,et al.  An MPI-Stream Hybrid Programming Model for Computational Clusters , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[21]  Robert B. Ross,et al.  Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.

[22]  Xin Zhao,et al.  MPI-Interoperable Generalized Active Messages , 2013, 2013 International Conference on Parallel and Distributed Systems.

[23]  Torsten Hoefler,et al.  Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations , 2015, ICS.

[24]  Erwin Laure,et al.  Idle waves in high-performance computing. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.