论文信息 - Towards a new tuple-based programming paradigm for expressing and optimizing irregular parallel computations

Towards a new tuple-based programming paradigm for expressing and optimizing irregular parallel computations

Irregular computations have the inherent property of being hard to automatically optimize and parallelize. In this paper, a new tuple-based programming paradigm is described for expressing irregular computations. At the basis, this programming paradigm allows irregular computations to be specified on an elementary data entry (tuple) level rather than on (complicated) data structures. As a consequence the actual data structures are being constructed during the code generation phase. Using this framework not only current implementations of irregular computations in for instance the C programming language can be automatically mapped into the tuple-based programming model, but also the code generated from this specification is competitive with hand-optimized codes. The potential of this approach is demonstrated on two representative applications: sparse triangular solve to represent sparse linear algebra and an implementation of the Bellman-Ford algorithm to represent graph algorithms. We demonstrate that from an ordinary triangular solve code, parallelized implementations can be automatically generated that up till now could only be derived by hand. We show that the performance of these automatically generated implementations is comparable with the performance of hand-optimized triangular solvers. For the Bellman-Ford algorithm initial experiments have been conducted which show that the derived GPU implementations of this algorithm achieve speedups in execution time of two to four orders of magnitude compared to the initial implementation.

Harry A. G. Wijshoff | Kristian F. D. Rietveld

[1] Aart J. C. Bik,et al. Compilation techniques for sparse matrix computations , 1993, ICS '93.

[2] John Cocke,et al. A program data flow analysis procedure , 1976, CACM.

[3] David S. Wise,et al. Representation-transparent matrix algorithms with scalable performance , 2007, ICS '07.

[4] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[5] Yousef Saad,et al. Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..

[6] H. Wijshoff,et al. Forelem: A Versatile Optimization Framework For Tuple-Based Computations , 2013 .

[7] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[8] Erwin M. Bakker,et al. A Compile/Run-time Environment for the Automatic Transformation of Linked List Data Structures , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[9] Santa Clara,et al. Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU , 2011 .

[10] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.