PhD Abstracts

From personal computers using an increasing number of cores, to supercomputers having millions of computing units, parallel architectures are the current standard. The high performance architectures are usually referenced to as hierarchical, as they are composed from clusters of multi-processors of multi-cores. Programming such architectures is known to be notoriously difficult. Writing parallel programs is, most of the time, difficult for both the algorithmic and the implementation phase. To answer those concerns, many structured models and languages were proposed in order to increase both expressiveness and efficiency. Among other models, Multi-BSP is a bridging model dedicated to hierarchical architecture that ensures efficiency, execution safety, scalability and cost prediction. It is an extension of the well known BSP model that handles flat architectures.In this thesis we introduce the Multi-ML language, which allows programming Multi-BSP algorithms “a la ML” and thus, guarantees the properties of the Multi-BSP model and the execution safety, thanks to a ML type system. To deal with the multi-level execution model of Multi-ML, we defined formal semantics which describe the valid evaluation of an expression. To ensure the execution safety of Multi-ML programs, we also propose a typing system that preserves replicated coherence. An abstract machine is defined to formally describe the evaluation of a Multi-ML program on a Multi-BSP architecture. An implementation of the language is available as a compilation toolchain. It is thus possible to generate an efficient parallel code from a program written in Multi-ML and execute it on any hierarchical machine

[1]  Ralph Duncan,et al.  A survey of parallel computer architectures , 1990, Computer.

[2]  Thilo Kielmann,et al.  Bandwidth-Latency Models (BSP, LogP) , 2011, Encyclopedia of Parallel Computing.

[3]  Charles U. Martel,et al.  Asynchronous PRAMs with Memory Latency , 1994, J. Parallel Distributed Comput..

[4]  Frédéric Gava,et al.  BSP-Why: A Tool for Deductive Verification of BSP Algorithms with Subgroup Synchronisation , 2015, International Journal of Parallel Programming.

[5]  Frédéric Loulergue,et al.  A FUNCTIONAL LANGUAGE FOR DEPARTMENTAL METACOMPUTING , 2004 .

[6]  Marco Danelutto,et al.  Structured Parallel Programming with "core" FastFlow , 2013, CEFP.

[7]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[8]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[9]  Philippe Olivier Alexandre Navaux,et al.  Observing the Impact of Multiple Metrics and Runtime Adaptations on BSP Process Rescheduling , 2010, Parallel Process. Lett..

[10]  W. J. Fokkink,et al.  GPU Programming in Functional Languages: A Comparison of Haskell GPU Embedded Domain Specific Languages , 2013 .

[11]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[12]  Ankush Das,et al.  Towards automatic resource bound analysis for OCaml , 2016, POPL.

[13]  Kevin Hammond,et al.  Parallel Functional Programming: An Introduction , 1994, PASCO.

[14]  Mathias Bourgoin,et al.  Spoc: GPGPU Programming through Stream Processing with OCaml , 2012, Parallel Process. Lett..

[15]  Jonathan Schaeffer,et al.  On the Versatility of Parallel Sorting by Regular Sampling , 1993, Parallel Comput..

[16]  Kevin Hammond Glasgow Parallel Haskell (GpH) , 2011, Encyclopedia of Parallel Computing.

[17]  Daniel Etiemble,et al.  Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++ , 2011, 2011 23rd International Symposium on Computer Architecture and High Performance Computing.

[18]  Frédéric Gava,et al.  Parallel I/O in Bulk-Synchronous Parallel ML , 2004, International Conference on Computational Science.

[19]  Olaf Bonorden Versatility of bulk synchronous parallel computing: from the heterogeneous cluster to the system on chip , 2008 .

[20]  Clemens Grelck,et al.  Classes and Objects as Basis for I/O in SAC , 1998 .

[21]  Paul H. J. Kelly Functional programming for loosely-coupled multiprocessors , 1989, Research monographs in parallel and distributed computing.

[22]  Luca Cardelli,et al.  Mobile Ambients , 1998, Foundations of Software Science and Computation Structure.

[23]  Philippe Olivier Alexandre Navaux,et al.  MigBSP: A Novel Migration Model for Bulk-Synchronous Parallel Processes Rescheduling , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[24]  Rob H. Bisseling,et al.  Parallel scientific computation - a structured approach using BSP and MPI , 2004 .

[25]  Tom Ridge,et al.  Ott: effective tool support for the working semanticist , 2007, ICFP '07.

[26]  Louis Mandel,et al.  Programming in JoCaml (Tool Demonstration) , 2008, ESOP.

[27]  Mauricio Marín,et al.  A parallel search engine with BSP , 2005, Third Latin American Web Congress (LA-WEB'2005).

[28]  Vincent Simonet Flow Caml in a Nutshell , 2003 .

[29]  Mary K. Vernon,et al.  LoPC: modeling contention in parallel algorithms , 1997, PPOPP '97.

[30]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[31]  Sergei Gorlatch,et al.  Send-receive considered harmful: Myths and realities of message passing , 2004, TOPL.

[32]  Robin Milner,et al.  Principal type-schemes for functional programs , 1982, POPL '82.

[33]  Sarita V. Adve,et al.  Parallel programming must be deterministic by default , 2009 .

[34]  Jeremy M. R. MARTIN,et al.  BSP Modelling of Two-Tiered Parallel Architectures , 1999 .

[35]  Arthur Charguéraud,et al.  Improving Type Error Messages in OCaml , 2015, ML/OCaml.

[36]  Rob H. Bisseling,et al.  An object‐oriented bulk synchronous parallel library for multicore programming , 2012, Concurr. Comput. Pract. Exp..

[37]  Simon L. Peyton Jones,et al.  Data parallel Haskell: a status report , 2007, DAMP '07.

[38]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[39]  John H. Reppy,et al.  Manticore: a heterogeneous parallel language , 2007, DAMP '07.

[40]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language , 1992 .

[41]  Pascal Sainrat,et al.  WCET Analysis of a Parallel 3D Multigrid Solver Executed on the MERASA Multi-Core , 2010, WCET.

[42]  Fatima K. Abu Salem,et al.  Parallel methods for absolute irreducibility testing , 2008, The Journal of Supercomputing.

[43]  Welf Löwe,et al.  BSP, LogP, and Oblivious Programs , 1998, Euro-Par.

[44]  Alexander Tiskin,et al.  The design and analysis of bulk-synchronous parallel algorithms , 1998 .

[45]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[46]  Nectarios Koziris,et al.  Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[47]  Louis Gesbert Développement systématique et sûreté d'exécution en programmation parallèle structurée. (Systematic development and safety of execution in structured parallel programming) , 2009 .

[48]  C. A. R. Hoare,et al.  The verified software initiative: A manifesto , 2009, CSUR.

[49]  John H. Reppy,et al.  Implicitly-threaded parallelism in Manticore , 2008, ICFP 2008.

[50]  Peter G. Harrison,et al.  Parallel Programming Using Skeleton Functions , 1993, PARLE.

[51]  David B. Skillicorn,et al.  Practical barrier synchronisation , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[52]  Matthew Felice Pace,et al.  BSP vs MapReduce , 2012, ICCS.

[53]  Guy E. Blelloch,et al.  Internally deterministic parallel algorithms can be fast , 2012, PPoPP '12.

[54]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[55]  Albert Chan,et al.  CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines , 2005, Int. J. High Perform. Comput. Appl..

[56]  Hossein Deldari,et al.  Multi-DaC programming model: a variant of multi-BSP model for divide-and-conquer algorithms , 2012, DAMP '12.

[57]  Mauricio Marín,et al.  BSP cost and scalability analysis for MapReduce operations , 2016, Concurr. Comput. Pract. Exp..

[58]  Alexandros V. Gerbessiotis,et al.  Extending the BSP model for multi-core and out-of-core computing: MBSP , 2015, Parallel Comput..

[59]  Sanjay Ranka,et al.  A practical hierarchical model of parallel computation , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[60]  Hossein Deldari,et al.  A Bridging Model for Branch-and-Bound Algorithms on Multi-core Architectures , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[61]  David C. Cann,et al.  A Report on the Sisal Language Project , 1990, J. Parallel Distributed Comput..

[62]  Daniel Etiemble,et al.  Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++ , 2012, International Journal of Parallel Programming.

[63]  Jonathan Schaeffer,et al.  Generating parallel programs from the wavefront design pattern , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[64]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[65]  Olivier Danvy,et al.  Abstracting abstract machines , 2011, Commun. ACM.

[66]  Christoph W. Kessler,et al.  NestStep: Nested Parallelism and Virtual Shared Memory for the BSP Model , 2000, The Journal of Supercomputing.

[67]  Jean-Thierry Lapresté,et al.  Quaff: efficient C++ design for parallel skeletons , 2006, Parallel Comput..

[68]  Frédéric Loulergue,et al.  Functional Parallel Programming with Revised Bulk Synchronous Parallel ML , 2010, 2010 First International Conference on Networking and Computing.

[69]  Frédéric Gava,et al.  Towards Mechanised Semantics of HPC: The BSP with Subgroup Synchronisation Case , 2015, ICA3PP.

[70]  Yan Jiang,et al.  Resource Load Balancing Based on Multi-agent in ServiceBSP Model , 2007, International Conference on Computational Science.

[71]  Ben H. H. Juurlink,et al.  The E-BSP Model: Incorporating General Locality and Unbalanced Communication into the BSP Model , 1996, Euro-Par, Vol. II.

[72]  Frédéric Gava,et al.  Formal Proofs of Functional BSP Programs , 2003, Parallel Process. Lett..

[73]  Andrew K. Wright Typing References by Effect Inference , 1992, ESOP.

[74]  Herbert Kuchen,et al.  Enhancing Muesli's Data Parallel Skeletons for Multi-core Computer Architectures , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[75]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[76]  Martin Alt,et al.  Using algorithmic skeletons for efficient grid computing with predictable performance , 2007 .

[77]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[78]  Joar Sohl A Scalable Run-Time System for NestStep on Cluster Supercomputers , 2006 .

[79]  Frédéric Loulergue,et al.  Bulk synchronous parallel ML with exceptions , 2006, Future Gener. Comput. Syst..

[80]  Jilin Zhang,et al.  OpenMP compiler for distributed memory architectures , 2010, Science China Information Sciences.

[81]  Bruce Hendrickson Computational science: Emerging opportunities and challenges , 2009 .

[82]  Hojung Cha,et al.  H-BSP: A Hierarchical BSP Computation Model , 2001, The Journal of Supercomputing.

[83]  Robin Milner,et al.  Communicating and mobile systems - the Pi-calculus , 1999 .

[84]  Sunu Antony Joseph Evaluating Threading Building Blocks Pipelines , 2010 .

[85]  Matthias Felleisen,et al.  A Syntactic Approach to Type Soundness , 1994, Inf. Comput..

[86]  Jean Fortin,et al.  BSP-Why, un outil pour la vérification déductive de programmes BSP : machine-checked semantics and application to distributed state-space algorithms. (BSP-Why, a tool for deductive verification of BSP programs : sémantiques mécanisées et application aux algorithmes d'espace d'états distribués) , 2013 .

[87]  Torsten Suel,et al.  BSPlib: The BSP programming library , 1998, Parallel Comput..

[88]  David B. Skillicorn,et al.  Optimizing Data-Parallel Programs Using the BSP Cost Model , 1998, Euro-Par.

[89]  Gert Smolka,et al.  The Oz Programming Model , 1996, Computer Science Today.

[90]  Hervé Grall,et al.  Coinductive big-step operational semantics , 2009, Inf. Comput..

[91]  Martin Odersky,et al.  Type Inference with Constrained Types , 1999, Theory Pract. Object Syst..

[92]  Ian T. Foster,et al.  Productive Parallel Programming: The PCN Approach , 1995, Sci. Program..

[93]  Steve Linton,et al.  Space Exploration using Parallel Orbits: a Study in Parallel Symbolic Computing , 2013, PARCO.

[94]  Christoph W. Kessler,et al.  Towards a Bulk-Synchronous Distributed Shared Memory Programming Environment for Grids , 2004, PARA.

[95]  Kun Zhou,et al.  BSGP: bulk-synchronous GPU programming , 2008, SIGGRAPH 2008.

[96]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[97]  Olaf Bonorden,et al.  The Paderborn University BSP (PUB) library , 2003, Parallel Comput..

[98]  Benjamin C. Pierce,et al.  Advanced Topics In Types And Programming Languages , 2004 .

[99]  Suresh Jagannathan,et al.  MultiMLton: A multicore-aware runtime for standard ML , 2014, J. Funct. Program..

[100]  Narsingh Deo,et al.  Coarse-Grained Parallelization of Distance-Bound Smoothing for the Molecular Conformation Problem , 2002, IWDC.

[101]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[102]  Damien Doligez,et al.  Portable, unobtrusive garbage collection for multiprocessor systems , 1994, POPL '94.

[103]  Hossein Deldari,et al.  Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach , 2014, The Journal of Supercomputing.

[104]  David B. Skillicorn,et al.  Questions and Answers about BSP , 1997, Sci. Program..

[105]  Rita Loogen,et al.  Under Consideration for Publication in J. Functional Programming Parallel Functional Programming in Eden , 2022 .

[106]  Franck Cappello,et al.  HiHCoHP-Toward a realistic communication model for hierarchical hyperclusters of heterogeneous processors , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[107]  Bruce M. Maggs,et al.  Proceedings of the 28th Annual Hawaii International Conference on System Sciences- 1995 Models of Parallel Computation: A Survey and Synthesis , 2022 .

[108]  Chong Li,et al.  SGL: towards a bridging model for heterogeneous hierarchical platforms , 2012, Int. J. High Perform. Comput. Netw..

[109]  Clyde P. Kruskal,et al.  Submachine Locality in the Bulk Synchronous Setting (Extended Abstract) , 1996, Euro-Par, Vol. II.

[110]  Franck Cappello,et al.  An algorithmic model for heterogeneous hyper-clusters: rationale and experience , 2005, Int. J. Found. Comput. Sci..

[111]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[112]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[113]  Herbert Kuchen,et al.  The Münster Skeleton Library Muesli: A comprehensive overview , 2009 .

[114]  Vasil P. Vasilev BSPGRID: Variable Resources Parallel Computation and Multiprogrammed Parallelism , 2003, Parallel Process. Lett..

[115]  Nicholas Carriero,et al.  Coordination languages and their significance , 1992, CACM.

[116]  D. S. Henty,et al.  Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[117]  Alexey Kukanov,et al.  The Foundations for Scalable Multicore Software in Intel Threading Building Blocks , 2007 .

[118]  David Gelernter,et al.  The Linda® Alternative to Message-Passing Systems , 1994, Parallel Comput..

[119]  Alexander Tiskin,et al.  New algorithms for efficient parallel string comparison , 2010, SPAA '10.

[120]  Frédéric Peschanski Parallel computing with the Pi-calculus , 2011, DAMP '11.

[121]  Robin Milner,et al.  A Theory of Type Polymorphism in Programming , 1978, J. Comput. Syst. Sci..

[122]  George Horatiu Botorog,et al.  Efficient High-Level Parallel Programming , 1998, Theor. Comput. Sci..

[123]  Jesper Larsson Träff,et al.  MPI on a Million Processors , 2009, PVM/MPI.

[124]  Susumu Horiguchi,et al.  Empirical Parallel Performance Prediction From Semantics-Based Profiling , 2001 .

[125]  Jeremy M. R. Martin,et al.  Dynamic BSP : towards a flexible approach to parallel computing over the grid , 2004 .

[126]  Edsger W. Dijkstra,et al.  Letters to the editor: go to statement considered harmful , 1968, CACM.

[127]  Jonathan Schaeffer,et al.  Parallel Sorting by Regular Sampling , 1992, J. Parallel Distributed Comput..

[128]  Daniel Johansson,et al.  Bulk-synchronous parallel computing on the CELL processor , 2007 .

[129]  Khaled Hamidouche,et al.  Three High Performance Architectures in the Parallel APMC Boat , 2010, 2010 Ninth International Workshop on Parallel and Distributed Methods in Verification, and Second International Workshop on High Performance Computational Systems Biology.

[130]  Pierre Jouvelot,et al.  The type and effect discipline , 1992, [1992] Proceedings of the Seventh Annual IEEE Symposium on Logic in Computer Science.

[131]  Edward A. Lee The problem with threads , 2006, Computer.

[132]  Maarten M. Fokkinga,et al.  Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire , 1991, FPCA.

[133]  Kevin Hammond The Dynamic Properties of Hume: A Functionally-Based Concurrent Language with Bounded Time and Space Behaviour , 2000, IFL.

[134]  Yves Bertot,et al.  Interactive Theorem Proving and Program Development: Coq'Art The Calculus of Inductive Constructions , 2010 .

[135]  Franck Cappello,et al.  On Communication Determinism in Parallel HPC Applications , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[136]  Hideya Iwasaki,et al.  A Parallel Skeleton Library for Multi-core Clusters , 2009, 2009 International Conference on Parallel Processing.

[137]  Frédéric Loulergue,et al.  Systematic Development of Correct Bulk Synchronous Parallel Programs , 2010, 2010 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[138]  Rita Loogen,et al.  Parallel FFT with Eden Skeletons , 2009, PaCT.

[139]  Daniel Etiemble,et al.  A framework for an automatic hybrid MPI+OpenMP code generation , 2011, SpringSim.

[140]  Veronica Gil Costa,et al.  MBSPDiscover: An Automatic Benchmark for MultiBSP Performance Analysis , 2014, CARLA.

[141]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[142]  Andreas Rossberg,et al.  Typed open programming: a higher-order, typed approach to dynamic modularity and distribution , 2007 .

[143]  Jacques Garrigue Relaxing the Value Restriction , 2002, APLAS.

[144]  Christoph W. Kessler,et al.  Managing distributed shared arrays in a bulk‐synchronous parallel programming environment , 2004, Concurr. Comput. Pract. Exp..

[145]  Leslie G. Valiant A Bridging Model for Multi-core Computing , 2008, ESA.

[146]  Vincent Simonet,et al.  Type Inference with Structural Subtyping: A Faithful Formalization of an Efficient Constraint Solver , 2003, APLAS.

[147]  Rita Loogen,et al.  Comparing Parallel Functional Languages: Programming and Performance , 2003, High. Order Symb. Comput..

[148]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[149]  Peter A. Fritzson,et al.  Principles of object-oriented modeling and simulation with Modelica 2.1 , 2004 .