Vectorisation avoidance

Flattening nested parallelism is a vectorising code transform that converts irregular nested parallelism into flat data parallelism. Although the result has good asymptotic performance, flattening thoroughly restructures the code. Many intermediate data structures and traversals are introduced, which may or may not be eliminated by subsequent optimisation. We present a novel program analysis to identify parts of the program where flattening would only introduce overhead, without appropriate gain. We present empirical evidence that avoiding vectorisation in these cases leads to more efficient programs than if we had applied vectorisation and then relied on array fusion to eliminate intermediates from the resulting code.

[1]  Youfeng Wu,et al.  Optimizing Data Parallel Operations on Many-Core Platforms , 2006 .

[2]  Simon Peyton Jones,et al.  Partial Vectorisation of Haskell Programs , 2008 .

[3]  Guy E. Blelloch,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1999, JACM.

[4]  Manuel M. T. Chakravarty,et al.  Flattening Trees , 1998, Euro-Par.

[5]  Roman Leshchinskiy,et al.  Stream fusion: from lists to streams to nothing at all , 2007, ICFP '07.

[6]  Manuel M. T. Chakravarty,et al.  Functional array fusion , 2001, ICFP '01.

[7]  Simon L. Peyton Jones,et al.  Data parallel Haskell: a status report , 2007, DAMP '07.

[8]  Simon L. Peyton Jones,et al.  System F with type equality coercions , 2007, TLDI '07.

[9]  Manuel M. T. Chakravarty,et al.  More types for nested data parallel programming , 2000, ICFP '00.

[10]  Roman Leshchinskiy,et al.  Rewriting Haskell Strings , 2007, PADL.

[11]  Lars Bergstrom,et al.  Lazy tree splitting , 2012, J. Funct. Program..

[12]  Guy E. Blelloch,et al.  Space profiling for parallel functional programs , 2010, Journal of Functional Programming.

[13]  Simon L. Peyton Jones Harnessing the Multicores: Nested Data Parallelism in Haskell , 2008, APLAS.

[14]  Manuel M. T. Chakravarty,et al.  Higher Order Flattening , 2006, International Conference on Computational Science.

[15]  Simon L. Peyton Jones,et al.  Associated type synonyms , 2005, ICFP '05.

[16]  Guy E. Blelloch,et al.  Compiling Collection-Oriented Languages onto Massively Parallel Computers , 1990, J. Parallel Distributed Comput..

[17]  John H. Reppy,et al.  Implementation techniques for nested-data-parallel languages , 2011 .

[18]  Simon L. Peyton Jones,et al.  Work efficient higher-order vectorisation , 2012, ICFP '12.

[19]  Andrew W. Appel,et al.  Continuation-passing, closure-passing style , 1989, POPL '89.

[20]  Lars Bergstrom,et al.  Nested data-parallelism on the gpu , 2012, ICFP 2012.

[21]  Simon L. Peyton Jones,et al.  Associated types with class , 2005, POPL '05.