Autotuning divide‐and‐conquer stencil computations
暂无分享,去创建一个
Maryam Mehri Dehnavi | Charles E. Leiserson | Ekanathan Palamadai Natarajan | C. Leiserson | M. Dehnavi | E. P. Natarajan
[1] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[2] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[3] David E. Keyes,et al. Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates , 2014, SIAM J. Sci. Comput..
[4] Jorge Nuno Silva,et al. Mathematical Games , 1959, Nature.
[5] Shoaib Ashraf Kamil,et al. Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages , 2012 .
[6] J. Hull. Options, Futures, and Other Derivatives , 1989 .
[7] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[8] A. Nakano,et al. Multiresolution molecular dynamics algorithm for realistic materials modeling on parallel computers , 1994 .
[9] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[10] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[11] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[12] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Payut Pantawongdecha. Autotuning divide-and-conquer matrix-vector multiplication , 2016 .
[14] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[15] José M. F. Moura,et al. Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..
[16] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[17] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[18] A. Nitsure,et al. Implemenation and optimization of a cache-oblivious Lattice Boltzmann algorithm , 2006 .
[19] Allen Taflove,et al. Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .
[20] Volker Strumpen,et al. The Cache Complexity of Multithreaded Cache Oblivious Algorithms , 2009, SPAA '06.
[21] Samuel Williams,et al. Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms , 2009, J. Parallel Distributed Comput..
[22] Volker Strumpen,et al. The cache complexity of multithreaded cache oblivious algorithms , 2006, SPAA.
[23] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[24] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[25] Weiqiang Wang,et al. In-Core Optimization of High-Order Stencil Computations , 2009, PDPTA.
[26] Rainer Bleck,et al. Salinity-driven Thermocline Transients in a Wind- and Thermohaline-forced Isopycnic Coordinate Model of the North Atlantic , 1992 .
[27] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[28] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[29] Leonid Oliker,et al. Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.
[30] David E. Keyes,et al. Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[31] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[32] Uday Bondhugula,et al. Tiling and optimizing time-iterated computations over periodic domains , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[33] James F. Epperson,et al. An Introduction to Numerical Methods and Analysis , 2001 .
[34] Wei Shyy,et al. Lattice Boltzmann Method for 3-D Flows with Curved Boundary , 2000 .
[35] Uday Bondhugula,et al. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.
[36] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.
[37] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[38] Liu Peng,et al. High-order stencil computations on multicore clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[39] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[40] Weiqiang Wang,et al. A Multilevel Parallelization Framework for High-Order Stencil Computations , 2009, Euro-Par.
[41] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).