EMPS: an environment for memory performance studies

This paper describes an overview of environment for memory performance studies (EMPS). EMPS is a framework to allow different data gathering and simulation tools to be composed together to predict the performance of parallel programs on a variety of current and future high end computing (HEC) systems. The framework seeks to combine the automated nature of direct execution simulation with the predictive capabilities of performance modeling.

[1]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[2]  Mark Horowitz,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[3]  R. L. Sites,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[4]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[5]  Hideo Aiso,et al.  Proceedings of the 16th annual international symposium on Computer architecture , 1986 .

[6]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[7]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[9]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[11]  Adolfy Hoisie,et al.  A comparison between the Earth Simulator and AlphaServer systems using predictive application performance models , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[12]  Jeffrey K. Hollingsworth,et al.  Using Hardware Performance Monitors to Isolate Memory Bottlenecks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[13]  Laura Carrington,et al.  A Framework for Application Performance Modeling and Prediction , 2002 .

[14]  John L. Hennessy,et al.  Performance debugging shared memory multiprocessor programs with MTOOL , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[15]  Lin Sun,et al.  Semi-Empirical Multiprocessor Performance Predictions , 1996, J. Parallel Distributed Comput..

[16]  André Seznec,et al.  Choosing representative slices of program execution for microarchitecture simulations: a preliminary , 2000 .

[17]  Marvin Theimer,et al.  Tango Lite: a Multiprocessor Simulation Environment. Unpublished Intro- Duction and User's Guide, Figure 4: Low Communication/computation Ratio for 16 Virtual Processors Figure 3: Medium Communication/computation Ratio for 16 Virtual Processors Figure 2: High Communication/computation Ratio Using 16 , 2008 .

[18]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[19]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[20]  Mark M. Mathis,et al.  A performance model of non-deterministic particle transport on large-scale systems , 2003, Future Gener. Comput. Syst..

[21]  Jesús Labarta,et al.  Performance Modeling of HPC Applications , 2003, PARCO.

[22]  Jeffrey K. Hollingsworth,et al.  SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[23]  SherwoodTimothy,et al.  Using SimPoint for accurate and efficient simulation , 2003 .

[24]  PredictionCelso L. Mendes,et al.  Performance Stability and Prediction , 1994 .

[25]  Alan Jay Smith,et al.  Performance Characterization of Optimizing Compilers , 1992, IEEE Trans. Software Eng..

[26]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[27]  Jens Simon,et al.  Accurate Performance Prediction for Assively Parallel Systems and Its Applications , 1996, Euro-Par, Vol. II.

[28]  James R. Larus,et al.  StormWatch: a tool for visualizing memory system protocols , 1995 .

[29]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[30]  Adolfy Hoisie,et al.  A performance model of non-deterministic particle transport on large-scale systems , 2006, Future Gener. Comput. Syst..

[31]  Ware Myers Supercomputing 91 , 1992 .

[32]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[33]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[34]  Kevin Skadron,et al.  Minimal subset evaluation: rapid warm-up for simulated hardware state , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[35]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[36]  Laura Carrington,et al.  A performance prediction framework for scientific applications , 2003, Future Gener. Comput. Syst..

[37]  Daniel A. Reed,et al.  Integrated compilation and scalability analysis for parallel systems , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[38]  Thomas J. LeBlanc,et al.  Parallel performance prediction using lost cycles analysis , 1994, Proceedings of Supercomputing '94.

[39]  Alan Jay Smith,et al.  Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes , 1995, IEEE Trans. Computers.

[40]  Margaret Martonosi,et al.  Integrating performance monitoring and communication in parallel computers , 1996, SIGMETRICS '96.