Domain-specific translator and optimizer for massive on- chip parallelism
暂无分享,去创建一个
[1] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[2] Rudolf Eigenmann,et al. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation , 2003, LCPC.
[3] J. Shalf,et al. Lawrence Berkeley National Laboratory Recent Work Title Auto-Tuning the 27-point Stencil for Multicore Permalink , 2009 .
[4] Ulrich Rüde,et al. Memory Characteristics of Iterative Methods , 1999, SC.
[5] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[6] Markus Schordan,et al. Treating a user-defined parallel library as a domain-specific language , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[7] Jianbin Fang,et al. A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.
[8] Philip M. Morse,et al. Methods of Mathematical Physics , 1947, The Mathematical Gazette.
[9] Kunle Olukotun,et al. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency , 2007 .
[10] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[11] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[12] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[13] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Xing Cai,et al. STABILITY OF TWO TIME-INTEGRATORS FOR THE ALIEV-PANFILOV SYSTEM , 2011 .
[15] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[16] Edmond Chow,et al. Exploiting 162-Nanosecond End-to-End Communication Latency on Anton , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] François Bodin,et al. Heterogeneous multicore parallel programming for graphics processing units , 2009 .
[18] Dean M. Tullsen,et al. Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.
[19] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[20] William J. Dally,et al. Imagine: Media Processing with Streams , 2001, IEEE Micro.
[21] William E. Lorensen,et al. The Transfer Function Bake-Off , 2001, IEEE Computer Graphics and Applications.
[22] Mahmut T. Kandemir,et al. Leakage Current: Moore's Law Meets Static Power , 2003, Computer.
[23] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[24] Tor Gillberg,et al. A New Parallel 3D Front Propagation Algorithm for Fast Simulation of Geological folds , 2012, ICCS.
[25] Samuel Williams,et al. Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[26] Dhabaleswar K. Panda,et al. Scalable Earthquake Simulation on Petascale Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .
[28] Samuel Williams,et al. Hardware/software co-design for energy-efficient seismic modeling , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[29] Kim B. Olsen,et al. On the implementation of perfectly matched layers in a three‐dimensional fourth‐order velocity‐stress finite difference scheme , 2003 .
[30] Barbara Chapman,et al. Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .
[31] N. Britton. Reaction-diffusion equations and their applications to biology. , 1989 .
[32] Stephen W. Poole,et al. An idiom-finding tool for increasing productivity of accelerators , 2011, ICS '11.
[33] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[34] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[35] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[36] Arie E. Kaufman,et al. GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[37] Samuel Williams,et al. The potential of the cell processor for scientific computing , 2005, CF '06.
[38] Joe Michael Kniss,et al. Multidimensional Transfer Functions for Interactive Volume Rendering , 2002, IEEE Trans. Vis. Comput. Graph..
[39] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[40] S. TIMOSHENKO,et al. An Introduction to the Theory of Elasticity: , 1936, Nature.
[41] Richard W. Vuduc,et al. Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization , 2009, LCPC.
[42] P. Maechling,et al. Strong shaking in Los Angeles expected from southern San Andreas earthquake , 2006 .
[43] Michael Wolfe,et al. Implementing the PGI Accelerator model , 2010, GPGPU-3.
[44] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[45] Trevor N. Mudge,et al. Power: A First-Class Architectural Design Constraint , 2001, Computer.
[46] J. Strikwerda. Finite Difference Schemes and Partial Differential Equations , 1989 .
[47] Maryann E. Martone,et al. Dimensionality Reduction on Multi-Dimensional Transfer Functions for Multi-Channel Volume Data Sets , 2010, Inf. Vis..
[48] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[49] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[50] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[51] R. Aliev,et al. A simple two-variable model of cardiac excitation , 1996 .
[52] Dimitri Komatitsch,et al. Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .
[53] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[54] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.
[55] James C. Hoe,et al. Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[56] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[57] Scott B. Baden,et al. Source-to-Source Optimization of CUDA C for GPU Accelerated Cardiac Cell Modeling , 2010, Euro-Par.
[58] Tarek S. Abdelrahman,et al. hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.
[59] Scott B. Baden,et al. Interactive data-centric viewpoint selection , 2012, Visualization and Data Analysis.
[60] Luis A. Dalguer,et al. Staggered-grid split-node method for spontaneous rupture simulation , 2007 .
[61] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[62] Peter M. Athanas,et al. Examining the Viability of FPGA Supercomputing , 2007, EURASIP J. Embed. Syst..
[63] Samuel Williams,et al. Scientific Computing Kernels on the Cell Processor , 2007, International Journal of Parallel Programming.
[64] William Gropp,et al. Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.
[65] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[66] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.