Modeling and predicting performance of high performance computing applications on hardware accelerators
暂无分享,去创建一个
Scott B. Baden | Stephen W. Poole | Laura Carrington | Allan Snavely | Didem Unat | Mitesh R. Meswani
[1] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[2] Sigarch. ISCA 2009 : the 36th Annual International Symposium on Computer Architecture, Conference Proceedings, Austin, Texas, USA, 20-24 June 2009 , 2009 .
[3] Bashar Qudah,et al. Accelerating the HMMER sequence analysis suite using conventional processors , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).
[4] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[5] Ludmila Svobodová. Computer Performance Measurement and Evaluation Methods: Analysis and Applications. , 1974 .
[6] David I. August,et al. Microarchitectural exploration with Liberty , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[7] David J. Lilja,et al. Simulation of computer architectures: simulators, benchmarks, methodologies, and recommendations , 2006, IEEE Transactions on Computers.
[8] Clark Jeffries. The Memory Model , 1991 .
[9] N.K. Govindaraju,et al. A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[10] Adolfy Hoisie,et al. Modelling the performance of large-scale systems , 2003, IEE Proc. Softw..
[11] Lin Sun,et al. Semi-Empirical Multiprocessor Performance Predictions , 1996, J. Parallel Distributed Comput..
[12] Alan Jay Smith,et al. Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.
[13] Sadaf R. Alam,et al. An Exploration of Performance Attributes for Symbolic Modeling of Emerging Processing Devices , 2007, HPCC.
[14] Ivona Brandic,et al. Performance Modeling and Prediction of Parallel and Distributed Computing Systems: A Survey of the State of the Art , 2007, First International Conference on Complex, Intelligent and Software Intensive Systems (CISIS'07).
[15] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[16] Daniel A. Reed,et al. Integrated compilation and scalability analysis for parallel systems , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[17] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[18] Laura Carrington,et al. A Framework for Application Performance Modeling and Prediction , 2002 .
[19] Christopher J. Hughes,et al. RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.
[20] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[21] Tony M. Brewer,et al. Instruction Set Innovations for the Convey HC-1 Computer , 2010, IEEE Micro.
[22] Ramesh Subramonian,et al. LogP: a practical model of parallel computation , 1996, CACM.
[23] Erich Strohmaier,et al. A genetic algorithms approach to modeling the performance of memory-bound computations , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[24] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[25] Michael Laurenzano,et al. How well can simple metrics represent the performance of HPC applications? , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[26] R. Saavedra,et al. Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times USC-CS-93-546 , 1993 .
[27] Michael Laurenzano,et al. PSINS: An Open Source Event Tracer and Execution Simulator , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.
[28] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[29] Brad Calder,et al. Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.
[30] Jesús Labarta,et al. A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[31] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[32] Yossi Matias,et al. The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms , 1999, SIAM J. Comput..
[33] Paul D. Gader,et al. Image algebra techniques for parallel image processing , 1987 .
[34] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[35] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[36] Stephen W. Poole,et al. An idiom-finding tool for increasing productivity of accelerators , 2011, ICS '11.
[37] Tom R. Halfhill. NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .
[38] Michael Laurenzano,et al. PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[39] Michael A. Frumkin,et al. Automatic Recognition of Performance Idioms in Scientific Applications , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[40] Jason D. Bakos. High-Performance Heterogeneous Computing with the Convey HC-1 , 2010, Computing in Science & Engineering.