The Effect of Network Noise on Large-Scale Collective Communications

The effect of operating system (OS) noise on the performance of large-scale applications is a growing concern and ameliorating the influence of OS noise is a subject of active research. A related problem is that of network noise that arises from the shared use of the interconnection network by parallel processes of different allocations or other background activities. To characterize the effect of network noise on parallel applications, we conducted a series of experiments with a specially crafted benchmark and simulations. Experimental results show a decrease in the communication performance of a parallel reduction operation by a factor of 2 on 246 nodes on an InfiniBand fat-tree and by several orders of magnitude on a BlueGene/P torus. Simulations show how influence of network noise grows with the system size. Although network noise is not as well-studied as OS noise, our results clearly show that it is an important factor that must be considered when running and analyzing large-scale applications.

[1]  Torsten Hoefler,et al.  Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.

[2]  Torsten Hoefler,et al.  Accurately measuring collective operations at massive scale , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[3]  Mohan Kumar,et al.  On generalized fat trees , 1995, Proceedings of 9th International Parallel Processing Symposium.

[4]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[5]  José E. Moreira,et al.  Blue Gene/L programming and operating environment , 2005, IBM J. Res. Dev..

[6]  Pradipta De,et al.  Impact of Noise on Scaling of Collectives: An Empirical Evaluation , 2006, HiPC.

[7]  Enrico Vicario,et al.  Interprocess Communication Dependency on Network Load , 1991, IEEE Trans. Software Eng..

[8]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, HiPC 2008.

[9]  Fumihiko Ino,et al.  LogGPS: a parallel computational model for synchronization analysis , 2001, PPoPP '01.

[10]  Scott Pakin,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.

[11]  David A. Bader,et al.  A measurement and simulation methodology for parallel computing performance studies , 2006 .

[12]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[13]  Kamil Iskra,et al.  Characterizing the Performance of “Big Memory” on Blue Gene Linux , 2009, 2009 International Conference on Parallel Processing Workshops.

[14]  Paul D. Gader,et al.  Image algebra techniques for parallel image processing , 1987 .

[15]  Ronald Mraz,et al.  Reducing the variance of point to point transfers in the IBM 9076 parallel computer , 1994, Proceedings of Supercomputing '94.

[16]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[17]  Chris J. Scheiman,et al.  LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation , 1997, J. Parallel Distributed Comput..

[18]  Torsten Hoefler,et al.  Netgauge: A Network Performance Measurement Framework , 2007, HPCC.

[19]  Darren J. Kerbyson,et al.  Optimized InfiniBand TM fat-tree routing for shift all-to-all communication patterns , 2010, ISC 2010.

[20]  William Gropp,et al.  Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.

[21]  Torsten Hoefler,et al.  ORCS : An Oblivious Routing Congestion Simulator , 2009 .

[22]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[23]  Suzanne M. Kelly,et al.  Software Architecture of the Light Weight Kernel, Catamount , 2005 .

[24]  K. Bryan A Numerical Method for the Study of the Circulation of the World Ocean , 1997 .

[25]  Darren J. Kerbyson A look at application performance sensitivity to the bandwidth and latency of InfiniBand networks , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[26]  Ronald Minnich,et al.  Analysis of microbenchmarks for performance tuning of clusters , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[27]  J. M. McGlaun,et al.  CTH: A software family for multi-dimensional shock physics analysis , 1995 .

[28]  Allen D. Malony,et al.  Overhead Compensation in Performance Profiling , 2004, Parallel Process. Lett..

[29]  Nisheeth K. Vishnoi,et al.  The Impact of Noise on the Scaling of Collectives: A Theoretical Approach , 2005, HiPC.

[30]  Allen D. Malony,et al.  The ghost in the machine: observing the effects of kernel operation on parallel application performance , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[31]  David A. Bader,et al.  Performance analysis of parallel programs via message-passing graph traversal , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[32]  Allen D. Malony,et al.  Trace-Based Parallel Performance Overhead Compensation , 2005, HPCC.