Using the High Productivity Language Chapel to Target GPGPU Architectures
暂无分享,去创建一个
Bradford L. Chamberlain | María Jesús Garzarán | David Padua | Albert Sidelnik | B. Chamberlain | D. Padua | A. Sidelnik | M. Garzarán
[1] Vivek Sarkar,et al. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA , 2009, Euro-Par.
[2] Rice UniversityCORPORATE,et al. High performance Fortran language specification , 1993 .
[3] Justin P. Haldar,et al. Accelerating advanced MRI reconstructions on GPUs , 2008, J. Parallel Distributed Comput..
[4] Michael Wolfe,et al. Implementing the PGI Accelerator model , 2010, GPGPU-3.
[5] Lawrence Snyder,et al. A programmer's guide to ZPL , 1999 .
[6] François Bodin,et al. Heterogeneous multicore parallel programming for graphics processing units , 2009, Sci. Program..
[7] Tarek S. Abdelrahman,et al. hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.
[8] Steven J. Deitz,et al. Global-view abstractions for user-defined reductions and scans , 2006, PPoPP '06.
[9] James Reinders,et al. Intel® threading building blocks , 2008 .
[10] John E. Stone,et al. Probing biomolecular machines with graphics processors , 2009, CACM.
[11] François Bodin,et al. Heterogeneous multicore parallel programming for graphics processing units , 2009 .
[12] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[13] A. Szalay,et al. Bias and variance of angular correlation functions , 1993 .
[14] Jia Guo,et al. Writing productive stencil codes with overlapped tiling , 2009 .
[15] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[16] Andrew B. White,et al. Trailblazing with Roadrunner , 2009, Computing in Science & Engineering.
[17] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[18] Klaus Schulten,et al. Accelerating Molecular Modeling Applications with GPU Computing , 2009 .
[19] Michel Dupuis,et al. Computation of electron repulsion integrals using the rys quadrature method , 1983 .
[20] Mike Murphy,et al. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.
[21] Rohit Chandra,et al. Parallel programming in openMP , 2000 .
[22] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[23] Steven J. Deitz,et al. User-defined distributions and layouts in chapel: philosophy and framework , 2010 .
[24] C. H. Flood,et al. The Fortress Language Specification , 2007 .
[25] John E. Stone,et al. Probing Biomolecular Machines with Graphics Processors , 2009, ACM Queue.
[26] Nicolas Pinto,et al. PyCUDA: GPU Run-Time Code Generation for High-Performance Computing , 2009, ArXiv.
[27] Michael D. McCool,et al. Performance evaluation of GPUs using the RapidMind development platform , 2006, SC.
[28] Daisuke Takahashi,et al. The HPC Challenge (HPCC) benchmark suite , 2006, SC.
[29] Richard W. Vuduc,et al. Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.
[30] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[31] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[32] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.