Clustering performance data efficiently at massive scales

Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune their applications and to exploit these systems fully. However, extreme scales pose unique challenges for performance-tuning tools, which can generate significant volumes of I/O. Compute-to-I/O ratios have increased drastically as systems have grown, and the I/O systems of large machines can handle the peak load from only a small fraction of cores. Tool developers need efficient techniques to analyze and to reduce performance data from large numbers of cores. We introduce CAPEK, a novel parallel clustering algorithm that enables in-situ analysis of performance data at run time. Our algorithm scales sub-linearly to 131,072 processes, running in less than one second even at that scale, which is fast enough for on-line use in production runs. The CAPEK implementation is fully generic and can be used for many types of analysis. We demonstrate its application to statistical trace sampling. Specifically, we use our algorithm to compute efficiently stratified sampling strategies for traces at run time. We show that such stratification can result in data-volume reduction of up to four orders of magnitude on current large-scale systems, with potential for greater reductions for future extreme-scale systems.

[1]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[4]  Bin Zhang,et al.  Distributed data clustering can be efficient and exact , 2000, SKDD.

[5]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[7]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[8]  Daniel A. Reed,et al.  Monitoring Large Systems Via Statistical Sampling , 2004, Int. J. High Perform. Comput. Appl..

[9]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[10]  Michael Mascagni,et al.  Algorithm 806: SPRNG: a scalable library for pseudorandom number generation , 1999, TOMS.

[11]  Sartaj Sahni,et al.  Clustering on a hypercube multicomputer , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[12]  Martin Schulz,et al.  PNMPI tools: a whole lot greater than the sum of their parts , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[13]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[14]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[15]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[16]  Feng Lin,et al.  A novel parallelization approach for hierarchical clustering , 2005, Parallel Comput..

[17]  Philip K. Hopke Chapter 2 The Application of Supercomputers to Chemometrics , 1990 .

[18]  Robert J. Fowler,et al.  Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[19]  S. Mallat A wavelet tour of signal processing , 1998 .

[20]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[21]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[22]  Clark F. Olson,et al.  Parallel Algorithms for Hierarchical Clustering , 1995, Parallel Comput..

[23]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[24]  Peter J. Rousseeuw,et al.  Using a parallel computer system for statistical resampling methods , 1988 .

[25]  Qin Ding,et al.  Parallel Hierarchical Clustering on Market Basket Data , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[26]  Vivek Sarkar,et al.  Software challenges in extreme scale systems , 2009 .

[27]  Bin Zhang,et al.  Linear Speed-Up for a Parallel Non-Approximate Recasting of Center-Based Clustering Algorithms, including K-Means, K-Harmonic Means, and EM 1 , 2000 .

[28]  Jacqueline H. Chen,et al.  Direct numerical simulation of hydrogen-enriched lean premixed methane–air flames , 2004 .

[29]  Fionn Murtagh,et al.  Multidimensional clustering algorithms , 1985 .

[30]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[31]  Martin Schulz,et al.  Scalable load-balance measurement for SPMD codes , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[32]  Al Geist,et al.  Major Computer Science Challenges At Exascale , 2009, Int. J. High Perform. Comput. Appl..

[33]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[34]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .