Shape-based cost analysis of skeletal parallel programs

This work presents an automatic cost-analysis system for an implicitly parallel skeletal programming language. Although deducing interesting dynamic characteristics of parallel programs (and in particular, run time) is well known to be an intractable problem in the general case, it can be alleviated by placing restrictions upon the programs which can be expressed. By combining two research threads, the “skeletal” and “shapely” paradigms which take this route, we produce a completely automated, computation and communication sensitive cost analysis system. This builds on earlier work in the area by quantifying communication as well as computation costs, with the former being derived for the Bulk Synchronous Parallel (BSP) model. We present details of our shapely skeletal language and its BSP implementation strategy together with an account of the analysis mechanism by which program behaviour information (such as shape and cost) is statically deduced. This information can be used at compile-time to optimise a BSP implementation and to analyse computation and communication costs. The analysis has been implemented in Haskell. We consider different algorithms expressed in our language for some example problems and illustrate each BSP implementation, contrasting the analysis of their efficiency by traditional, intuitive methods with that achieved by our cost calculator. The accuracy of cost predictions by our cost calculator against the run time of real parallel programs is tested experimentally. Previous shape-based cost analysis required all elements of a vector (our nestable bulk data structure) to have the same shape. We partially relax this strict requirement on data structure regularity by introducing new shape expressions in our analysis framework. We demonstrate that this allows us to achieve the first automated analysis of a complete derivation, the well known maximum segment sum algorithm of Skillicorn and Cai.

[1]  David K. Gifford,et al.  Static dependent costs for estimating execution time , 1994, LFP '94.

[2]  Murray Cole,et al.  A Monadic Calculus for Parallel Costing of a Functional Language of Arrays , 1997, Euro-Par.

[3]  Rod M. Burstall,et al.  Proving Properties of Programs by Structural Induction , 1969, Comput. J..

[4]  Paul G. Spirakis,et al.  BSP vs LogP , 1996, SPAA '96.

[5]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[6]  Kevin Parrott,et al.  Portability, predictability and performance for parallel computing: BSP in practice , 1996, Concurr. Pract. Exp..

[7]  Vipin Kumar,et al.  Isoefficiency: measuring the scalability of parallel algorithms and architectures , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[8]  William A. Howard,et al.  The formulae-as-types notion of construction , 1969 .

[9]  Sergei Gorlatch,et al.  Toward Formally-Based Design of Message Passing Programs , 2000, IEEE Trans. Software Eng..

[10]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[11]  Daniel Le Métayer,et al.  ACE: an automatic complexity evaluator , 1988, TOPL.

[12]  C. Barry Jay Shape Analysis for Parallel Computing , 1995 .

[13]  David B. Skillicorn,et al.  Portability of performance with the BSPLib communications library , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[14]  Hans-Wolfgang Loidl,et al.  Granularity in large-scale parallel functional programming , 1998 .

[15]  Lewis W. Tucker,et al.  CMMD: Active Messages on the CM-5 , 1994, Parallel Comput..

[16]  Peter G. Harrison,et al.  Parallel Programming Using Skeleton Functions , 1993, PARLE.

[17]  Richard S. Bird,et al.  Lectures on Constructive Functional Programming , 1989 .

[18]  Martin Dyer,et al.  Parallel algorithm design on the WPRAM model , 1995 .

[19]  Frédéric Loulergue,et al.  Parallel composition and bulk synchronous parallel functional programming , 2000, Scottish Functional Programming Workshop.

[20]  Andrew Ireland,et al.  Towards a skeleton based parallelising compiler for SML , 1997 .

[21]  Amr Sabry,et al.  Proving the correctness of reactive systems using sized types , 1996, POPL '96.

[22]  D. Skillicom Architecture-independent parallel computation , 1990 .

[23]  Yike Guo,et al.  Functional Skeletons for Parallel Coordination , 1995, Euro-Par.

[24]  Michael T. Goodrich,et al.  A bridging model for parallel computation, communication, and I/O , 1996, CSUR.

[25]  C. Barry Jay Costing parallel programs as a function of shapes , 2000, Sci. Comput. Program..

[26]  James R. Larus,et al.  Using the run-time sizes of data structures to guide parallel-thread creation , 1994, LFP '94.

[27]  Forum Mpi MPI: A Message-Passing Interface , 1994 .

[28]  Richard S. Bird,et al.  Algebraic Identities for Program Calculation , 1989, Comput. J..

[29]  David John Busvine Detecting parallel structures in functional programs , 1993 .

[30]  Hing Wing To,et al.  Optimising the parallel behaviour of combinations of program components , 1995 .

[31]  Murray Cole,et al.  BSP-based Cost Analysis of Skeletal Programs , 1999, Scottish Functional Programming Workshop.

[32]  Murray Cole,et al.  STATIC PERFORMANCE PREDICTION OF SKELETAL PARALLEL PROGRAMS , 2002, Parallel Algorithms Appl..

[33]  Jeremy Gibbons Algebras for tree algorithms , 1991 .

[34]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[35]  R. M. BURSTALL Inductively Defined Functions in Functional Programming Languages , 1987, J. Comput. Syst. Sci..

[36]  Paul Brna,et al.  A recursive techniques editor for Prolog , 1991 .

[37]  Mads Rosendahl,et al.  Automatic complexity analysis , 1989, FPCA.

[38]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[39]  Torsten Suel,et al.  Towards efficiency and portability: programming with the BSP model , 1996, SPAA '96.

[40]  Sergei Gorlatch,et al.  Optimization rules for programming with collective operations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[41]  George Horatiu Botorog,et al.  Efficient Parallel Programming with Algorithmic Skeletons , 1996, Euro-Par, Vol. I.

[42]  Wentong Cai,et al.  A Cost Calculus for Parallel Functional Programming , 1995, J. Parallel Distributed Comput..

[43]  David B. Skillicorn,et al.  Questions and Answers about BSP , 1997, Sci. Program..

[44]  Richard S. Bird,et al.  An introduction to the theory of lists , 1987 .

[45]  BackusJohn Can programming be liberated from the von Neumann style , 1978 .

[46]  David Feldcamp,et al.  Parsec—a software development environment for performance oriented parallel programming , 1993 .

[47]  Murray Cole,et al.  Parallel Programming with List Homomorphisms , 1995, Parallel Process. Lett..

[48]  Norman Scaife,et al.  Engineering a Parallel Compiler for Standard ML , 1998 .

[49]  Per Brinch Hansen Studies in Computational Science: Parallel Programming Paradigms , 1995 .

[50]  Richard J. Boulton,et al.  An ML editor based on proofs-as-programs , 1999, 14th IEEE International Conference on Automated Software Engineering.

[51]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[52]  J. W. Backus,et al.  Can programming be liberated from the von Neumann style , 1977 .

[53]  Pierre Jouvelot,et al.  Polymorphic time systems for estimating program complexity , 1992, LOPL.

[54]  C. Barry Jay,et al.  Shape in computing , 1996, CSUR.

[55]  Christoph Walther,et al.  Argument-Bounded Algorithms as a Basis for Automated Termination Proofs , 1988, CADE.

[56]  Marco Vanneschi,et al.  A methodology for the development and the support of massively parallel programs , 1992, Future Gener. Comput. Syst..

[57]  Norman Scaife,et al.  NESTED ALGORITHMIC SKELETONS FROM HIGHER ORDER FUNCTIONS , 2001 .

[58]  Emmanuel Chailloux,et al.  Caml Flight: a Portable SPMD Extension of ML for Distributed Memory Multiprocessors , 1995 .

[59]  Sergei Gorlatch,et al.  Skeletons and Transformations in an Integrated Parallel Programming Environment , 1999, PaCT.

[60]  Peter M. Dew,et al.  The Performance of Parallel Algorithmic Skeletons , 1995 .

[61]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language , 1992 .

[62]  Jonathan Hill Portability of Performance in the BSP Model , 1999, Research Directions in Parallel Functional Programming.

[63]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[64]  Frank van Harmelen,et al.  The Oyster-Clam System , 1990, CADE.

[65]  C. R. Jesshope Vector Models for Data-Parallel Computing , 1991 .

[66]  Fethi A. Rabhi,et al.  A Parallel Programming Methodology Based on Paradigms , 1995 .

[67]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[68]  Fethi A. Rabhi Exploiting parallelism in functional languages: a “paradigm-oriented” approach , 1995 .

[69]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[70]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[71]  Susanna Pelagatti Structured development of parallel programs , 1997 .

[72]  David Turner,et al.  Ensuring Termination in ESFP , 2000, J. Univers. Comput. Sci..

[73]  Olaf Bonorden,et al.  The Paderborn university BSP (PUB) library-design, implementation and performance , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[74]  Tore Andreas Bratvold Skeleton-based parallelisation of functional programs , 1994 .

[75]  Andreas Abel Specification and Verification of a Formal System for Structurally Recursive Functions , 1999, TYPES.

[76]  Mohammad M. Hamdan,et al.  A combinational framework for parallel programming using algorithmic skeletons , 2000 .

[77]  Frédéric Loulergue,et al.  High Level BSP Programming: BSML and BSlambda , 1999, Scottish Functional Programming Workshop.

[78]  David B. Skillicorn,et al.  Foundations of parallel programming , 1995 .

[79]  Robin Milner,et al.  Definition of standard ML , 1990 .

[80]  Roopa Rangaswami,et al.  A cost analysis for a higher-order parallel programming model , 1996 .

[81]  Hans-Wolfgang Loidl,et al.  A Sized Time System for a Parallel Functional Language , 1996 .

[82]  George Horatiu Botorog High level parallel programming and the efficient implementation of numerical algorithms , 1998 .

[83]  Masato Takeichi,et al.  A calculational fusion system HYLO , 1997, Algorithmic Languages and Calculi.