SMPI Courseware: Teaching Distributed-Memory Computing with MPI in Simulation

It is typical in High Performance Computing (HPC) courses to give students access to HPC platforms so that they can benefit from hands-on learning opportunities. Using such platforms, however, comes with logistical and pedagogical challenges. For instance, a logistical challenge is that access to representative platforms must be granted to students, which can be difficult for some institutions or course modalities; and a pedagogical challenge is that hands-on learning opportunities are constrained by the configurations of these platforms. A way to address these challenges is to instead simulate program executions on arbitrary HPC platform configurations. In this work we focus on simulation in the specific context of distributed-memory computing and MPI programming education. While using simulation in this context has been explored in previous works, our approach offers two crucial advantages. First, students write standard MPI programs and can both debug and analyze the performance of their programs in simulation mode. Second, large-scale executions can be simulated in short amounts of time on a single standard laptop computer. This is possible thanks to SMPI, an MPI simulator provided as part of SimGrid. After detailing the challenges involved when using HPC platforms for HPC education and providing background information about SMPI, we present SMPI Courseware. SMPI Courseware is a set of in-simulation assignments that can be incorporated into HPC courses to provide students with hands-on experience for distributed-memory computing and MPI programming learning objectives. We describe some these assignments, highlighting how simulation with SMPI enhances the student learning experience.

[1]  Emilio Luque,et al.  A quantitative approach for teaching parallel computing , 1992, SIGCSE '92.

[2]  Bruce P. Lester The art of parallel programming , 1993 .

[3]  J. D. Hartman,et al.  Teaching parallel processing using free resources , 1996, Technology-Based Re-Engineering Engineering Education Proceedings of Frontiers in Education FIE'96 26th Annual Conference.

[4]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[5]  Xin Yuan,et al.  STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.

[6]  Charles Shubert,et al.  StarHPC — Teaching parallel programming within elastic compute cloud , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.

[7]  P. Marshall,et al.  Virtual Clusters for Hands-on Linux Cluster Construction Education , 2010 .

[8]  Victor P. Gergel,et al.  The ParaLab System for Investigating the Parallel Algorithms , 2010, MTPP.

[9]  Emilio Luque,et al.  An Innovative Teaching Strategy to Understand High-Performance Systems through Performance Evaluation , 2012, ICCS.

[10]  Arnaud Legrand,et al.  Scalable Multi-purpose Network Representation for Large Scale Distributed System Simulation , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[11]  Nicholas Aaron Robison,et al.  Comparison of VM deployment methods for HPC education , 2012, RIIT '12.

[12]  Henri Casanova,et al.  On the validity of flow-level tcp network models for grid and cloud simulations , 2013, TOMC.

[13]  Omar Abuzaghleh,et al.  Implementing an affordable high-performance computing for teaching-oriented computer science curriculum , 2013, TOCE.

[14]  Charles Peck,et al.  LittleFe: The high performance computing education appliance , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[15]  Katherine E. Isaacs,et al.  There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[16]  K. PandaDhabaleswar,et al.  The MVAPICH Project: Evolution and Sustainability of an Open Source Production Quality MPI Library for HPC , 2013 .

[17]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[18]  Joseph A. Driscoll,et al.  A low-cost computer cluster for high-performance computing education , 2014, IEEE International Conference on Electro/Information Technology.

[19]  Yuichi Inadomi,et al.  Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  E. Kozinov,et al.  Learning Parallel Computations with ParaLab , 2015 .

[21]  Sascha Hunold,et al.  Reproducible MPI Benchmarking is Still Not as Easy as You Think , 2016, IEEE Transactions on Parallel and Distributed Systems.

[22]  Yijia Zhang,et al.  Diagnosing Performance Variations in HPC Applications Using Machine Learning , 2017, ISC.

[23]  Arnaud Legrand,et al.  Simulating MPI Applications: The SMPI Approach , 2017, IEEE Transactions on Parallel and Distributed Systems.

[24]  Tom Cornebize Capacity Planning of Supercomputers: Simulating MPI Applications at Scale , 2017 .

[25]  Jian Zhang,et al.  Learning Cluster Computing by Creating a Raspberry Pi Cluster , 2017, ACM Southeast Regional Conference.

[26]  Suzanne J. Matthews,et al.  Teaching Parallel and Distributed Computing with MPI on Raspberry Pi Clusters: (Abstract Only) , 2018, SIGCSE.

[27]  A. Pears Using the Dist Simulator to Teach Parallel Computing Concepts , .