Work efficient higher-order vectorisation

Existing approaches to higher-order vectorisation, also known as flattening nested data parallelism, do not preserve the asymptotic work complexity of the source program. Straightforward examples, such as sparse matrix-vector multiplication, can suffer a severe blow-up in both time and space, which limits the practicality of this method. We discuss why this problem arises, identify the mis-handling of index space transforms as the root cause, and present a solution using a refined representation of nested arrays. We have implemented this solution in Data Parallel Haskell (DPH) and present benchmarks showing that realistic programs, which used to suffer the blow-up, now have the correct asymptotic work complexity. In some cases, the asymptotic complexity of the vectorised program is even better than the original.

[1]  Simon L. Peyton Jones,et al.  Harnessing the Multicores: Nested Data Parallelism in Haskell , 2008, FSTTCS.

[2]  Guy E. Blelloch,et al.  Space profiling for parallel functional programs , 2008, ICFP.

[3]  Alabama At THE UNIVERSITY OF , 2003 .

[4]  Richard Bornat,et al.  Vectorising a non-strict data-parallel functional language , 2013 .

[5]  Simon L. Peyton Jones,et al.  Let-floating: moving bindings to give faster programs , 1996, ICFP '96.

[6]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[7]  Manuel M. T. Chakravarty,et al.  Higher Order Flattening , 2006, International Conference on Computational Science.

[8]  Guy E. Blelloch,et al.  A provable time and space efficient implementation of NESL , 1996, ICFP '96.

[9]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .

[10]  Simon L. Peyton Jones,et al.  Associated types with class , 2005, POPL '05.

[11]  Daniel W. Palmer,et al.  Piecewise Execution of Nested Data-Parallel Programs , 1995, LCPC.

[12]  Anwar Ghuloum Future Proof Data Parallel Algorithms and Software on Intel Multicore Architecture , 2007 .

[13]  John H. Reppy,et al.  A scheduling framework for general-purpose parallel languages , 2008, ICFP.

[14]  Daniel W. Palmer,et al.  Work-efficient nested data-parallelism , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[15]  Roman Leshchinskiy,et al.  Stream fusion: from lists to streams to nothing at all , 2007, ICFP '07.

[16]  Simon L. Peyton Jones,et al.  Data parallel Haskell: a status report , 2007, DAMP '07.

[17]  James Riely,et al.  Flattening Is an Improvement , 2000, SAS.

[18]  Guy E. Blelloch,et al.  Compiling Collection-Oriented Languages onto Massively Parallel Computers , 1990, J. Parallel Distributed Comput..