A genetic algorithms approach to modeling the performance of memory-bound computations

Benchmarks that measure memory bandwidth, such as STREAM, Apex-MAPS and MultiMAPS, are increasingly popular due to the "Von Neumann" bottleneck of modern processors which causes many calculations to be memory-bound. We present a scheme for predicting the performance of HPC applications based on the results of such benchmarks. A Genetic Algorithm approach is used to "learn" bandwidth as a function of cache hit rates per machine with MultiMAPS as the fitness test. The specific results are 56 individual performance predictions including 3 full-scale parallel applications run on 5 different modern HPC architectures, with various CPU counts and inputs, predicted within 10% average difference with respect to independently verified runtimes.

[1]  Daniel A. Reed,et al.  Integrated compilation and scalability analysis for parallel systems , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[2]  Thomas J. LeBlanc,et al.  Parallel performance prediction using lost cycles analysis , 1994, Proceedings of Supercomputing '94.

[3]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[4]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[5]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[6]  Daniel P. Spooner,et al.  Identification of Performance Characteristics from Multi-view Trace Analysis , 2003, International Conference on Computational Science.

[7]  F. Wolf,et al.  Performance Profiling and Analysis of DoD Applications Using PAPI and TAU , 2005, 2005 Users Group Conference (DOD-UGC'05).

[8]  Hsien-Hsin S. Lee,et al.  A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1 , 1994, 1994 International Conference on Parallel Processing Vol. 3.

[9]  Jens Simon,et al.  Accurate Performance Prediction for Assively Parallel Systems and Its Applications , 1996, Euro-Par, Vol. II.

[10]  Adolfy Hoisie,et al.  Scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[11]  Michael J. Flynn,et al.  Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.

[12]  Michael Laurenzano,et al.  How well can simple metrics represent the performance of HPC applications? , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[13]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[14]  Babak Falsafi,et al.  Modeling cost/performance of a parallel computer simulator , 1997, TOMC.

[15]  Gheith A. Abandah,et al.  Modeling the communication performance of the IBM SP2 , 1996, Proceedings of International Conference on Parallel Processing.

[16]  Lizy K. John,et al.  Performance prediction using program similarity , 2006 .

[17]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[18]  Laura Carrington,et al.  A Framework for Application Performance Modeling and Prediction , 2002 .

[19]  Rainer Bleck,et al.  An oceanic general circulation model framed in hybrid isopycnic-Cartesian coordinates , 2002 .

[20]  Knut Stener Grimsrud Quantifying locality , 1993 .

[21]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[22]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[23]  Lin Sun,et al.  Semi-Empirical Multiprocessor Performance Predictions , 1996, J. Parallel Distributed Comput..

[24]  David Skinner Performance monitoring of parallel scientific applications , 2005 .

[25]  John M. Mellor-Crummey,et al.  Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.

[26]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[27]  C. Hoke,et al.  Aerodynamic Analysis of Complex Missile Configurations Using AVUS (Air Vehicles Unstructured Solver) , 2004 .

[28]  Marc Snir,et al.  On the Theory of Spatial and Temporal Locality , 2005 .

[29]  Erich Strohmaier,et al.  Architecture independent performance characterization and benchmarking for scientific applications , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[30]  Erich Strohmaier,et al.  Quantifying Locality In The Memory Access Patterns of HPC Applications , 2005, ACM/IEEE SC 2005 Conference (SC'05).