论文信息 - Gpu Computing Is at a Tipping Point, Becoming More Widely Used in Demanding Consumer Applications and High-performance Computing. This Article Describes the Rapid Evolution of Gpu Architectures—from Graphics Processors to Massively Parallel Many-core Multiprocessors, Recent Developments in Gpu Compu

Gpu Computing Is at a Tipping Point, Becoming More Widely Used in Demanding Consumer Applications and High-performance Computing. This Article Describes the Rapid Evolution of Gpu Architectures—from Graphics Processors to Massively Parallel Many-core Multiprocessors, Recent Developments in Gpu Compu

......As we enter the era of GPU computing, demanding applications with substantial parallelism increasingly use the massively parallel computing capabilities of GPUs to achieve superior performance and efficiency. Today GPU computing enables applications that we previously thought infeasible because of long execution times. With the GPU’s rapid evolution from a configurable graphics processor to a programmable parallel processor, the ubiquitous GPU in every PC, laptop, desktop, and workstation is a many-core multithreaded multiprocessor that excels at both graphics and computing applications. Today’s GPUs use hundreds of parallel processor cores executing tens of thousands of parallel threads to rapidly solve large problems having substantial inherent parallelism. They’re now the most pervasive massively parallel processing platform ever available, as well as the most costeffective. Using NVIDIA GPUs as examples, this article describes the evolution of GPU computing and its parallel computing model, the enabling architecture and software developments, how computing applications use CPUþGPU coprocessing, example application performance speedups, and trends in GPU computing. GPU computing’s evolution Why have GPUs evolved to have large numbers of parallel threads and many cores? The driving force continues to be the real-time graphics performance needed to render complex, high-resolution 3D scenes at interactive frame rates for games. Rendering high-definition graphics scenes is a problem with tremendous inherent parallelism. A graphics programmer writes a single-thread program that draws one pixel, and the GPU runs multiple instances of this thread in parallel—drawing multiple pixels in parallel. Graphics programs, written in shading languages such as Cg or HighLevel Shading Language (HLSL), thus scale transparently over a wide range of thread and processor parallelism. Also, GPU computing programs—written in C or Cþþ with the CUDA parallel computing model, or using a parallel computing API inspired by CUDA such as DirectCompute or OpenCL—scale transparently over a wide range of parallelism. Software scalability, too, has enabled GPUs to rapidly increase their parallelism and performance with increasing transistor density.

W. Dally | J. Nickolls

[1] Kipton Barros,et al. Solving lattice QCD systems of equations using mixed precision solvers on GPUs , 2009, Comput. Phys. Commun..

[2] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[3] Kurt Keutzer,et al. Efficient, high-quality image contour detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[5] Jen-Hsun Huang,et al. 2009: The GPU computing tipping point , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[6] Yongchao Liu,et al. MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[7] Vijay S. Pande,et al. Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[8] Ross T. Whitaker,et al. Feasibility of GPU-assisted iterative image reconstruction for mobile C-arm CT , 2009, Medical Imaging.

[9] Manfred Krafczyk,et al. TeraFLOP computing on a desktop PC with GPUs for 3D CFD , 2008 .

[10] Yao Zhang,et al. Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[11] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[12] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[13] Ivan S Ufimtsev,et al. Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[14] Graham Pullan,et al. Acceleration of a 3D Euler solver using commodity graphics hardware , 2008 .

[15] Kurt Keutzer,et al. Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors , 2008 .

[16] Robert Strzodka,et al. Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[17] Simon Portegies Zwart,et al. High-performance direct gravitational N-body simulations on graphics processing units , 2007, astro-ph/0702058.

[18] Henry P. Moreton,et al. The GeForce 6800 , 2005, IEEE Micro.

[19] William R. Mark,et al. Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[20] J. Nickolls. Graphics and Computing GPUs , 2022 .