Solving linear recurrences on hybrid GPU accelerated manycore systems

The aim of this paper is to show that linear recurrence systems with constant coefficients can be efficiently solved on hybrid GPU accelerated manycore systems with modern Fermi GPU cards. The main idea is to use the recently developed divide-and-conquer algorithm which can be expressed in terms of Level 2 and 3 BLAS operations. The results of experiments performed on hybrid system with Intel Core i7 and NVIDIA Tesla C2050 are also presented and discussed.

[1]  Steven W. Smith,et al.  The Scientist and Engineer's Guide to Digital Signal Processing , 1997 .

[2]  Roberto Barrio,et al.  On the numerical evaluation of linear recurrences , 2003 .

[3]  Almerico Murli,et al.  Algorithm 682: Talbot's method of the Laplace inversion problems , 1990, TOMS.

[4]  Begnaud Francis Hildebrand,et al.  Introduction to numerical analysis: 2nd edition , 1987 .

[5]  Przemyslaw Stpiczynski A Note on the Numerical Inversion of the Laplace Transform , 2005, PPAM.

[6]  Przemyslaw Stpiczynski,et al.  Fast parallel algorithms for computing trigonometric sums , 2002, Proceedings. International Conference on Parallel Computing in Electrical Engineering.

[7]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[8]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[9]  Wei-Ngan Chin,et al.  Optimizing the parallel computation of linear recurrences using compact matrix representations , 2009, J. Parallel Distributed Comput..

[10]  Przemyslaw Stpiczynski Numerical Evaluation of Linear Recurrences on High Performance Computers and Clusters of Workstations , 2004 .

[11]  Josep-Lluís Larriba-Pey,et al.  Review of General and Toeplitz Vector Bidiagonal Solvers , 1996, Parallel Comput..

[12]  Przemyslaw Stpiczynski,et al.  Numerical Evaluation of Linear Recurrences on High Performance Computers and Clusters of Workstations , 2004, Parallel Computing in Electrical Engineering, 2004. International Conference on.

[13]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[14]  H. A. van der Vorst,et al.  Vectorization of Linear Recurrence Relations , 1989 .

[15]  Guy E. Blelloch,et al.  Solving Linear Recurrences with Loop Raking , 1995, J. Parallel Distributed Comput..

[16]  Jack J. Dongarra,et al.  Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[17]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[18]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[19]  Przemyslaw Stpiczynski Evaluating recursive filters on distributed memory parallel computers , 2006 .

[20]  Przemyslaw Stpiczynski,et al.  Fast Parallel Algorithm for Polynomial Evaluation , 2003, Parallel Algorithms Appl..

[21]  Przemyslaw Stpiczynski,et al.  Solving Linear Recurrence Systems Using Level 2 and 3 BLAS Routines , 2003, PPAM.