Performance modelling for parallel PDE solvers on NUMA-systems
暂无分享,去创建一个
[1] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[2] Hong Linh Truong,et al. On Using SCALEA for Performance Analysis of Distributed and Parallel Programs , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[3] Erik Hagersten,et al. A statistical multiprocessor cache model , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[4] Erik Hagersten,et al. StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.
[5] Wolfgang E. Nagel,et al. Group-Based Performance Analysis for Multithreaded SMP Cluster Applications , 2001, Euro-Par.
[6] Dimitrios S. Nikolopoulos,et al. A transparent runtime data distribution engine for OpenMP , 2000 .
[7] Zhao Zhang,et al. Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[8] Erik Hagersten,et al. WildFire: a scalable path for SMPs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[9] Jonathan Harris,et al. Extending OpenMP For NUMA Machines , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[10] William Gropp,et al. High-performance parallel implicit CFD , 2001, Parallel Comput..
[11] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .
[12] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[13] A. Charlesworth. The Sun Fireplane System Interconnect , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[14] H. H. Rachford,et al. The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .
[15] Sverker Holmgren,et al. Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers , 2004, International Conference on Computational Science.
[16] Jeffrey K. Hollingsworth,et al. Using Hardware Performance Monitors to Isolate Memory Bottlenecks , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[17] Jesús Labarta,et al. Generation of Simple Analytical Models for Message Passing Applications , 2004, Euro-Par.
[18] Alan George,et al. Computer Solution of Large Sparse Positive Definite , 1981 .
[19] Mark Horowitz,et al. An analytical cache model , 1989, TOCS.
[20] Sverker Holmgren,et al. Performance of PDE solvers on a self-optimizing NUMA architecture , 2002, Parallel Algorithms Appl..
[21] E. Ayguade,et al. Scaling Irregular Parallel Codes with Minimal Programming Effort , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[22] Felix Wolf,et al. CATCH - A Call-Graph Based Automatic Tool for Capture of Hardware Performance Metrics for MPI and OpenMP Applications , 2002, Euro-Par.
[23] Jeffrey K. Hollingsworth,et al. SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[24] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[25] Sverker Holmgren,et al. Analyzing Advanced PDE Solvers Through Simulation , 2004, PARA.
[26] Fabrizio Petrini,et al. Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[27] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[28] Csaba Andras Moritz,et al. Performance Modeling and Evaluation of MPI , 2001, J. Parallel Distributed Comput..
[29] Sverker Holmgren,et al. Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers , 2006, IWOMP.
[30] Erik Hagersten,et al. SIP: Performance Tuning through Source Code Interdependence , 2002, Euro-Par.
[31] John M. Mellor-Crummey,et al. Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.
[32] Eduard Ayguadé,et al. Is Data Distribution Necessary in OpenMP? , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[33] Lisa Noordergraaf,et al. Performance experiences on Sun's Wildfire prototype , 1999, SC '99.