Performance characterization and evaluation of parallel PDE solvers

Computer simulations that solve partial differential equations (PDEs) are common in many fields of science and engineering. To decrease the execution time of the simulations, the PDEs can be solved ...

[1]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[2]  David E. Bernholdt,et al.  Computational Quality of Service for Scientific Components , 2004, CBSE.

[3]  P. Colella,et al.  Local adaptive mesh refinement for shock hydrodynamics , 1989 .

[4]  Scott H. Hawley,et al.  Boson stars driven to the brink of black hole formation , 2000 .

[5]  Brian J. N. Wylie,et al.  Memory Profiling using Hardware Counters , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[6]  James Arthur Kohl,et al.  A Component Architecture for High-Performance Computing , 2003 .

[7]  Erik Hagersten,et al.  SIP: Performance Tuning through Source Code Interdependence , 2002, Euro-Par.

[8]  Susan J. Eggers,et al.  Eliminating False Sharing , 1991, ICPP.

[9]  Erik Hagersten,et al.  VASA: A Simulator Infrastructure with Adjustable Fidelity , 2005, IASTED PDCS.

[10]  A.M. Wissink,et al.  Large Scale Parallel Structured AMR Calculations Using the SAMRAI Framework , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[11]  Jarmo Rantakokko Partitioning strategies for structured multiblock grids , 2000, Parallel Comput..

[12]  Peng Wang,et al.  A New Generation EOS Compositional Reservoir Simulator: Part II - Framework and Multiprocessing , 1997 .

[13]  Sverker Holmgren,et al.  Cache Memory Behavior of Advanced PDE Solvers , 2003, PARCO.

[14]  Alan Jay Smith,et al.  Line (Block) Size Choice for CPU Cache Memories , 1987, IEEE Transactions on Computers.

[15]  Ken Kennedy,et al.  Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion , 2004, Int. J. High Perform. Comput. Appl..

[16]  Michael Thuné,et al.  Partitioning Strategies for Composite Grids , 1997, Parallel Algorithms Appl..

[17]  J.C. Browne,et al.  A Common Data Management Infrastructure for Adaptive Algorithms for PDE Solutions , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[18]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[19]  Dinshaw S. Balsara,et al.  Highly parallel structured adaptive mesh refinement using parallel language-based approaches , 2001, Parallel Comput..

[20]  Jarmo Rantakokko,et al.  A Framework for Partitioning Structured Grids with Inhomogeneous Workload , 1998, Parallel Algorithms Appl..

[21]  David S. Johnson,et al.  Some simplified NP-complete problems , 1974, STOC '74.

[22]  Alain Darte On the Complexity of Loop Fusion , 2000, Parallel Comput..

[23]  Markus Kowarschik,et al.  An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms , 2002, Algorithms for Memory Hierarchies.

[24]  Kamy Sepehrnoori,et al.  A New Generation EOS Compositional Reservoir Simulator: Part I - Formulation and Discretization , 1997 .

[25]  Manish Parashar,et al.  An Application-Centric Characterization of Domain-Based SFC Partitioners for Parallel SAMR , 2002, IEEE Trans. Parallel Distributed Syst..

[26]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[27]  Ralf Deiterding,et al.  An improved bi-level algorithm for partitioning dynamic grid hierarchies , 2006 .

[28]  Frederik Edelvik,et al.  Hybrid Solvers for the Maxwell Equations in Time-Domain , 2002 .

[29]  Greg L. Bryan,et al.  Fluids in the universe: adaptive mesh refinement in cosmology , 1999, Comput. Sci. Eng..

[30]  James Arthur Kohl,et al.  A Component Architecture for High-Performance Scientific Computing , 2006, Int. J. High Perform. Comput. Appl..

[31]  Manish Parashar,et al.  Characterization of domain-based partitioners for parallel SAMR applications , 2000 .

[32]  Johan Steensland Efficient Partitioning of Dynamic Structured Grid Hierarchies , 2002 .

[33]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[34]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[35]  Ulrich Rüde,et al.  Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .

[36]  Bradford Sturtevant,et al.  Experiments on the Richtmyer-Meshkov instability of an air/SF6 interface , 1995 .

[37]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[38]  Richard D. Hornung,et al.  Enhancing scalability of parallel structured AMR calculations , 2003, ICS '03.

[39]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[40]  Matthew W. Choptuik Experiences with an adaptive mesh refinement algorithm in numerical relativity. , 1989 .

[41]  Jaideep Ray,et al.  A heuristic re-mapping algorithm reducing inter-level communication in SAMR applications. , 2003 .

[42]  Antony Jameson,et al.  How Many Steps are Required to Solve the Euler Equations of Steady, Compressible Flow: In Search of a Fast Solution Algorithm , 2001 .

[43]  Scott Devine,et al.  Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.

[44]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[45]  Manish Parashar,et al.  Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies , 1999 .

[46]  Erik Hagersten,et al.  Miss penalty reduction using bundled capacity prefetching in multiprocessors , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[47]  Zhiling Lan,et al.  Dynamic Load Balancing of SAMR Applications on Distributed Systems , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[48]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[49]  Seung Ryoul Maeng,et al.  An adaptive sequential prefetching scheme in shared-memory multiprocessors , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[50]  Jeffrey K. Hollingsworth,et al.  Using Hardware Performance Monitors to Isolate Memory Bottlenecks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[51]  Jarmo Rantakokko,et al.  Algorithmic optimizations of a conjugate gradient solver on shared memory architectures , 2006, Int. J. Parallel Emergent Distributed Syst..

[52]  Ralf Deiterding,et al.  A virtual test facility for the efficient simulation of solid material response under strong shock and detonation wave loading , 2006, Engineering with Computers.

[53]  Allen D. Malony,et al.  Computational Quality of Service for Scientific CCA Applications: Composition, Substitution, and Reconfiguration , 2006 .

[54]  James C. Browne,et al.  On partitioning dynamic adaptive grid hierarchies , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[55]  G. Bryan,et al.  Cosmological Adaptive Mesh Refinement , 1998, astro-ph/9807121.

[56]  James Arthur Kohl,et al.  Parallel PDE-Based Simulations Using the Common Component Architecture , 2006 .

[57]  Michel Dubois,et al.  Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[58]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[59]  Michael L. Gittings,et al.  MODELING THE 1958 LITUYA BAY MEGA-TSUNAMI, II , 2002 .

[60]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[61]  Allen D. Malony,et al.  Design and implementation of a parallel performance data management framework , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[62]  Zhiling Lan,et al.  A novel dynamic load balancing scheme for parallel systems , 2002, J. Parallel Distributed Comput..

[63]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[64]  Sverker Holmgren,et al.  Implementation Issues for High Performance CFD , 2004 .

[65]  Thomas M. Conte,et al.  Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation , 1998, IEEE Trans. Computers.

[66]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[67]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.