N‐body computations using skeletal frameworks on multicore CPU/graphics processing unit architectures: an empirical performance evaluation

With the emergence of general‐purpose computation on graphics processing units, high‐level approaches that hide the conceptual complexity of the low‐level Compute Unified Device Architecture and Open Computing Language platforms are the subject of active research. However, these approaches may require a trade‐off in terms of achieved performance and utilisation on graphics processing units hardware and may impose algorithmic limitations. In this paper, we present and systematically evaluate the parallel performance of three implementations of the brute force, all‐pairs N‐body algorithm with skeletal deployments based on the FastFlow, SkePU and Thrust frameworks. Our results indicate that the skeletal framework implementation achieves up to two orders of magnitude speed‐up over serial version with a Tesla M2050 with lower implementation complexity than low‐level Compute Unified Device Architecture programming. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[2]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[3]  Sergei Gorlatch,et al.  Data Parallelism in C++ Template Programs: a Barnes-hut Case Study , 2005, Parallel Process. Lett..

[4]  Tarek A. El-Ghazawi,et al.  Productivity of GPUs under different programming paradigms , 2012, Concurr. Comput. Pract. Exp..

[5]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[6]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[7]  Massimo Torquati,et al.  FastFlow: Efficient Parallel Streaming Applications on Multi-core , 2009, ArXiv.

[8]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[9]  J. B. McGuire,et al.  Study of Exactly Soluble One-Dimensional N-Body Problems , 1964 .

[10]  L. Greengard The Rapid Evaluation of Potential Fields in Particle Systems , 1988 .

[11]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[12]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[13]  Peter Kilpatrick,et al.  Accelerating Code on Multi-cores with FastFlow , 2011, Euro-Par.

[14]  Horacio González-Vélez,et al.  A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..