The hybrid/heterogeneous nature of future microprocessors and large high-performance computing systems will result in a reliance on two major types of components: multicore/manycore central processing units and special purpose hardware/massively parallel accelerators. While these technologies have numerous benefits, they also pose substantial performance challenges for developers, including scalability, software tuning, and programming issues. Researchers at the Forefront Reveal Results from Their Own State-of-the-Art WorkEdited by some of the top researchers in the field and with contributions from a variety of international experts, Scientific Computing with Multicore and Accelerators focuses on the architectural design and implementation of multicore and manycore processors and accelerators, including graphics processing units (GPUs) and the Sony Toshiba IBM (STI) Cell Broadband Engine (BE) currently used in the Sony PlayStation 3. The book explains how numerical libraries, such as LAPACK, help solve computational science problems; explores the emerging area of hardware-oriented numerics; and presents the design of a fast Fourier transform (FFT) and a parallel list ranking algorithm for the Cell BE. It covers stencil computations, auto-tuning, optimizations of a computational kernel, sequence alignment and homology, and pairwise computations. The book also evaluates the portability of drug design applications to the Cell BE and illustrates how to successfully exploit the computational capabilities of GPUs for scientific applications. It concludes with chapters on dataflow frameworks, the Charm++ programming model, scan algorithms, and a portable intracore communication framework. Explores the New Computational Landscape of Hybrid Processors By offering insight into the process of constructing and effectively using the technology, this volume provides a thorough and practical introduction to the area of hybrid computing. It discusses introductory concepts and simple examples of parallel computing, logical and performance debugging for parallel computing, and advanced topics and issues related to the use and building of many applications.
[1]
Samuel Williams,et al.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
,
2009,
Parallel Comput..
[2]
Richard W. Vuduc,et al.
Model-driven autotuning of sparse matrix-vector multiply on GPUs
,
2010,
PPoPP '10.
[3]
Yousef Saad,et al.
Iterative methods for sparse linear systems
,
2003
.
[4]
Rajesh Bordawekar,et al.
Optimizing Sparse Matrix-Vector Multiplication on GPUs using Compile-time and Run-time Strategies
,
2008
.
[5]
Michael Garland,et al.
Implementing sparse matrix-vector multiplication on throughput-oriented processors
,
2009,
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[6]
Samuel Williams,et al.
Auto-tuning performance on multicore computers
,
2008
.
[7]
Richard Vuduc,et al.
Automatic performance tuning of sparse matrix kernels
,
2003
.
[8]
Yao Zhang,et al.
Scan primitives for GPU computing
,
2007,
GH '07.
[9]
John R. Rice,et al.
Solving elliptic problems using ELLPACK
,
1985,
Springer series in computational mathematics.
[10]
Guy E. Blelloch,et al.
AD-A 270 601 Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
,
1993
.
[11]
Richard W. Vuduc,et al.
Sparsity: Optimization Framework for Sparse Matrix Kernels
,
2004,
Int. J. High Perform. Comput. Appl..