Leveraging performance of 3D finite difference schemes in large scientific computing simulations
暂无分享,去创建一个
[1] Robert Strzodka,et al. Impact of System and Cache Bandwidth on Stencil Computations Across Multiple Processor Generations , 2011 .
[2] Olivier Temam,et al. Cache interference phenomena , 1994, SIGMETRICS.
[3] A three dimensional global weather prediction model using a finite element scheme for vertical discretization , 1989 .
[4] Alok N. Choudhary,et al. Improved parallel I/O via a two-phase run-time access strategy , 1993, CARN.
[5] Mauricio Hanzich,et al. Unveiling WARIS Code, a Parallel and Multi-purpose FDM Framework , 2013, ENUMATH.
[6] A. Prieto,et al. Perfectly matched layers for modelling seismic oceanography experiments , 2008 .
[7] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[8] Leonid Oliker,et al. Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.
[9] Zhenman Fang,et al. Multi-stage coordinated prefetching for present-day processors , 2014, ICS '14.
[10] John D. McCalpin,et al. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality , 1999 .
[11] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[12] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.
[13] Gerhard Wellein,et al. LIKWID: Lightweight Performance Tools , 2011, CHPC.
[14] Marianne Winslett,et al. Improving MPI-IO output performance with active buffering plus threads , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[15] Georg Hager,et al. Introducing a Performance Model for Bandwidth-Limited Loop Kernels , 2009, PPAM.
[16] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[17] Liu Peng,et al. High-order stencil computations on multicore clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[18] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[19] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[20] Patrick R. Amestoy,et al. 3D Frequency-domain Finite-difference Modeling of Acoustic Wave Propagation Using a Massively Parallel Direct Solver: a Feasibility Study , 2005 .
[21] Mauricio Hanzich,et al. Evaluation of 3D RTM On HPC Platforms , 2008 .
[22] Tarek S. Abdelrahman,et al. Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..
[23] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .
[24] Collin McCurdy,et al. Diagnosis and optimization of application prefetching performance , 2013, ICS '13.
[25] Eduard Ayguadé,et al. Exploiting memory customization in FPGA for 3D stencil computations , 2009, 2009 International Conference on Field-Programmable Technology.
[26] David G. Wonnacott,et al. Time Skewing for Parallel Computers , 1999, LCPC.
[27] Arnau Folch,et al. FALL3D: A computational model for transport and deposition of volcanic ash , 2009, Comput. Geosci..
[28] Arnau Folch,et al. Volcanic ash over Europe during the eruption of Eyjafjallajökull on Iceland, April–May 2010 , 2012 .
[29] V. Thomée. From finite differences to finite elements a short history of numerical analysis of partial differential equations , 2001 .
[30] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .
[31] Apan Qasem,et al. Understanding stencil code performance on multicore architectures , 2011, CF '11.
[32] George A. McMechan,et al. A review of seismic acoustic imaging by reverse‐time migration , 1989, Int. J. Imaging Syst. Technol..
[33] W. L. Ko,et al. Reentry heat transfer analysis of the space shuttle orbiter , 1982 .
[34] Collin McCurdy,et al. Characterizing the Impact of Prefetching on Scientific Application Performance , 2013, PMBS@SC.
[35] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[36] Gerhard Wellein,et al. Multi-core architectures: Complexities of performance prediction and the impact of cache topology , 2009, ArXiv.
[37] Jianbin Fang,et al. An Empirical Study of Intel Xeon Phi , 2013, ArXiv.
[38] Robin L. Dennis,et al. NARSTO critical review of photochemical models and modeling , 2000 .
[39] Anne Rogers,et al. Software support for speculative loads , 1992, ASPLOS V.
[40] Volker Strumpen,et al. The cache complexity of multithreaded cache oblivious algorithms , 2006, SPAA.
[41] Alfons G. Hoekstra,et al. Efficient analytical modelling of multi-level set-associative caches , 1999 .
[42] Catherine de Groot-Hedlin,et al. A FINITE DIFFERENCE SOLUTION TO THE HELMHOLTZ EQUATION IN A RADIALLY SYMMETRIC WAVEGUIDE: APPLICATION TO NEAR-SOURCE SCATTERING IN OCEAN ACOUSTICS , 2008 .
[43] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[44] Katherine Yelick,et al. Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply , 2004 .