Implementation and Benchmarking of Two-Dimensional Vortex Interactions on a Graphics Processing Unit

A parallel single-instruction multiple data implementation of a two-level nested loop, which uses shared memory, is implemented via general-purpose computing on a graphics processing unit. The general-purpose computing on a graphics processing unit implementation is compared to MATLAB®, C, and other implementations of the same algorithm, which are primarily executed on the central processing unit. The general-purpose computing on a graphics processing unit implementation is determined to be decisively faster (80 times) than the fastest single threaded implementation. A linear algebra implementation is determined to consume excessive memory without a corresponding increase in computational performance. Although the speedup is hardware dependent, the general-purpose computing on a graphics processing unit algorithm exploits cache memory in a manner that is severely constrained on conventional multicore central processing units. For this reason, the nested loop described here is a natural fit for the single-...

[1]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[2]  P. T. Pappas,et al.  The original Ampère force and Biot-Savart and Lorentz forces , 1983 .

[3]  Juan Touriño,et al.  XARK: An extensible framework for automatic recognition of computational kernels , 2008, TOPL.

[4]  Mark J. Stock,et al.  Graphics Processing Unit-Accelerated Boundary Element Method and Vortex Particle Method , 2011, J. Aerosp. Comput. Inf. Commun..

[5]  B. Balachandran,et al.  Flexible flapping systems: computational investigations into fluid-structure interactions , 2011, The Aeronautical Journal (1968).

[6]  L. Verlet Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules , 1967 .

[7]  Berend Smit,et al.  Understanding molecular simulation: from algorithms to applications , 1996 .

[8]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[9]  Balakumar Balachandran,et al.  GPGPU implementation and benchmarking of the unsteady vortex lattice method , 2013 .

[10]  Paul Langston,et al.  Crowd dynamics discrete element multi-circle model , 2006 .

[11]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[12]  Berend Smit,et al.  Understanding Molecular Simulation , 2001 .

[13]  Hubert Nguyen,et al.  GPU Gems 3 , 2007 .

[14]  Toshikazu Ebisuzaki,et al.  A special-purpose N-body machine GRAPE-1 , 1990 .

[15]  A. H. Nayfeh,et al.  A vortex-lattice method for general, unsteady aerodynamics , 1985 .

[16]  Balakumar Balachandran,et al.  GPU Based Simulation of Physical Systems Characterized by Mobile Discrete Interactions , 2013 .

[17]  Naga K. Govindaraju,et al.  GPGPU: general-purpose computation on graphics hardware , 2006, SC.

[18]  Mateo Valero,et al.  Vector architectures: past, present and future , 1998, ICS '98.

[19]  David P. Luebke,et al.  CUDA: Scalable parallel programming for high-performance scientific computing , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[20]  Lyle N. Long,et al.  Object-oriented unsteady vortex lattice method for flapping flight , 2004 .

[21]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[22]  S. Wagner,et al.  An experimental and numerical study of the vortex structure in the wake of a wind turbine , 2000 .

[23]  Balakumar Balachandran,et al.  Flapping Aerodynamics and Ground Effect , 2012 .

[24]  Yao Zhang,et al.  Acceleration of 2-D Compressible Flow Solvers with Graphics Processing Unit Clusters , 2011, J. Aerosp. Comput. Inf. Commun..

[25]  Pedro J. Boschetti,et al.  Stability and Performance of a Light Unmanned Airplane in Ground Effect , 2010 .