EMPS: an environment for memory performance studies
暂无分享,去创建一个
Jeffrey K. Hollingsworth | Allan Snavely | Simone Sbaraglia | Kattamuri Ekanadham | A. Snavely | J. Hollingsworth | K. Ekanadham | S. Sbaraglia
[1] Margaret Martonosi,et al. MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.
[2] Mark Horowitz,et al. ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.
[3] R. L. Sites,et al. ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.
[4] Jack J. Dongarra,et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[5] Hideo Aiso,et al. Proceedings of the 16th annual international symposium on Computer architecture , 1986 .
[6] Jeffrey K. Hollingsworth,et al. An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..
[7] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[8] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.
[9] Robert J. Fowler,et al. MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[10] Thomas M. Conte,et al. Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.
[11] Adolfy Hoisie,et al. A comparison between the Earth Simulator and AlphaServer systems using predictive application performance models , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[12] Jeffrey K. Hollingsworth,et al. Using Hardware Performance Monitors to Isolate Memory Bottlenecks , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[13] Laura Carrington,et al. A Framework for Application Performance Modeling and Prediction , 2002 .
[14] John L. Hennessy,et al. Performance debugging shared memory multiprocessor programs with MTOOL , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[15] Lin Sun,et al. Semi-Empirical Multiprocessor Performance Predictions , 1996, J. Parallel Distributed Comput..
[16] André Seznec,et al. Choosing representative slices of program execution for microarchitecture simulations: a preliminary , 2000 .
[17] Marvin Theimer,et al. Tango Lite: a Multiprocessor Simulation Environment. Unpublished Intro- Duction and User's Guide, Figure 4: Low Communication/computation Ratio for 16 Virtual Processors Figure 3: Medium Communication/computation Ratio for 16 Virtual Processors Figure 2: High Communication/computation Ratio Using 16 , 2008 .
[18] Brad Calder,et al. Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.
[19] Kathryn S. McKinley,et al. Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.
[20] Mark M. Mathis,et al. A performance model of non-deterministic particle transport on large-scale systems , 2003, Future Gener. Comput. Syst..
[21] Jesús Labarta,et al. Performance Modeling of HPC Applications , 2003, PARCO.
[22] Jeffrey K. Hollingsworth,et al. SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[23] SherwoodTimothy,et al. Using SimPoint for accurate and efficient simulation , 2003 .
[24] PredictionCelso L. Mendes,et al. Performance Stability and Prediction , 1994 .
[25] Alan Jay Smith,et al. Performance Characterization of Optimizing Compilers , 1992, IEEE Trans. Software Eng..
[26] Alan Jay Smith,et al. Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.
[27] Jens Simon,et al. Accurate Performance Prediction for Assively Parallel Systems and Its Applications , 1996, Euro-Par, Vol. II.
[28] James R. Larus,et al. StormWatch: a tool for visualizing memory system protocols , 1995 .
[29] James E. Smith,et al. Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[30] Adolfy Hoisie,et al. A performance model of non-deterministic particle transport on large-scale systems , 2006, Future Gener. Comput. Syst..
[31] Ware Myers. Supercomputing 91 , 1992 .
[32] Jesús Labarta,et al. A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[33] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.
[34] Kevin Skadron,et al. Minimal subset evaluation: rapid warm-up for simulated hardware state , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.
[35] Anoop Gupta,et al. The Stanford FLASH multiprocessor , 1994, ISCA '94.
[36] Laura Carrington,et al. A performance prediction framework for scientific applications , 2003, Future Gener. Comput. Syst..
[37] Daniel A. Reed,et al. Integrated compilation and scalability analysis for parallel systems , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[38] Thomas J. LeBlanc,et al. Parallel performance prediction using lost cycles analysis , 1994, Proceedings of Supercomputing '94.
[39] Alan Jay Smith,et al. Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes , 1995, IEEE Trans. Computers.
[40] Margaret Martonosi,et al. Integrating performance monitoring and communication in parallel computers , 1996, SIGMETRICS '96.