There goes the neighborhood: Performance degradation due to nearby jobs

Predictable performance is important for understanding and alleviating application performance issues; quantifying the effects of source code, compiler, or system software changes; estimating the time required for batch jobs; and determining the allocation requests for proposals. Our experiments show that on a Cray XE system, the execution time of a communication-heavy parallel application ranges from 28% faster to 41% slower than the average observed performance. Blue Gene systems, on the other hand, demonstrate no noticeable run-to-run variability. In this paper, we focus on Cray machines and investigate potential causes for performance variability such as OS jitter, shape of the allocated partition, and interference from other jobs sharing the same network links. Reducing such variability could improve overall throughput at a computer center and save energy costs.

[1]  William T. C. Kramer,et al.  Performance Variability of Highly Parallel Architectures , 2003, International Conference on Computational Science.

[2]  Nicholas J. Wright,et al.  Measuring and Understanding Variation in Benchmark Performance , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.

[3]  Quentin F. Stout,et al.  Statistical Analysis of Communication Time on the IBM SP2 , 2008 .

[4]  Jeffrey S. Vetter,et al.  Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.

[5]  G. M. Stocks,et al.  Order-N multiple scattering approach to electronic structure calculations. , 1995, Physical review letters.

[6]  A. B. Langdon,et al.  Filamentation and forward Brillouin scatter of entire smoothed and aberrated laser beams , 2000 .

[7]  William Gropp,et al.  Exploring the relationship between parallel application run-time and network performance in clusters , 2003, 28th Annual IEEE International Conference on Local Computer Networks, 2003. LCN '03. Proceedings..

[8]  D. Skinner,et al.  Understanding the causes of performance variability in HPC workloads , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[9]  Scott Pakin,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.

[10]  Torsten Hoefler,et al.  Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Terry Jones,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[12]  Williams,et al.  Influence of spatial and temporal laser beam smoothing on stimulated brillouin scattering in filamentary laser light. , 1995, Physical review letters.

[13]  Alex D. Breslow,et al.  The Case For Colocation of HPC Workloads , 2012 .

[14]  C. DeTar,et al.  Scaling tests of the improved Kogut-Susskind quark action , 1999, hep-lat/9912018.