Parallel programming with tree skeletons

Parallel computing is an essential technique to deal with large scaled problems. In recent years, while hardware for parallel computing is getting widely available, developing software for parallel computing remains as a hard task for many programmers. The main difficulties are caused by the communication, synchronization, and data distribution required in parallel programs. This thesis studies the theory and practice of parallel programming for trees based on parallel primitives called tree skeletons. Trees are important data structures for representing structured data. However, their irregular and ill-balanced structure makes it hard to develop efficient parallel programs on them, because naive divide-and-conquer parallel computation may lead to poor performance for ill-balanced trees. To remedy this situation, this thesis develops a new framework for parallel programming for trees on the basis of the programming model called skeletal parallel programming. Skeletal parallel programming, first proposed by Cole, encourages programmers to develop parallel programs by composing ready-made components called parallel skeletons (or algorithmic skeletons). A theory has been proposed for design of parallel skeletons for lists based on constructive algorithms, and several libraries of parallel skeletons have been developed to bring the theory into practice. This thesis extends these ideas from lists to trees. The following are three important contributions in the thesis. The first contribution is the design of parallel tree skeletons for both binary trees and general trees of arbitrary shape. Our parallel tree skeletons have a sequential interface but with a parallel implementation; the sequential interface is designed based on the theory of constructive algorithmics, while the parallel implementation is either based on tree contraction algorithms or newly developed ones. The second contribution is a set of theories for skeletal parallel programming on trees. These theories provide us with a systematic method for deriving skeletal parallel programs from sequential programs. We illustrate effectiveness of the method by solving two classes of nontrivial problems, maximum marking problems and XPath queries. The third contribution is an implementation of a parallel skeleton library for trees. We developed a new implementation algorithm for tree skeletons, in which a tree is divided with high locality and good load balance and tree skeletons are executed efficiently in

[1]  Rajesh K. Mansharamani Parallel Computing Using the Prefix Problem , 1995 .

[2]  Shigeru Chiba,et al.  A metaobject protocol for C++ , 1995, OOPSLA.

[3]  Masato Takeichi,et al.  Parallelization with Tree Skeletons , 2003, Euro-Par.

[4]  Ernst W. Mayr,et al.  Optimal Tree Contraction and Term Matching on the Hypercube and Related Networks , 1997, Algorithmica.

[5]  Selim G. Akl,et al.  Parallel Maximum Sum Algorithms on Interconnection Networks , 1999 .

[6]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[7]  Tadao Takaoka,et al.  Algorithms for the problem of K maximum sums and a VLSI algorithm for the K maximum subarrays problem , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[8]  Richard S. Bird Maximum marking problems , 2001, J. Funct. Program..

[9]  Masato Takeichi,et al.  Diffusion: Calculating Efficient Parallel Programs , 1999, PEPM.

[10]  Masato Takeichi,et al.  A Compositional Framework for Developing Parallel Programs on Two-Dimensional Arrays , 2007, International Journal of Parallel Programming.

[11]  P. Dew,et al.  Parallel Csg, Skeletons and Performance Modelling 1 , 1996 .

[12]  Yike Guo,et al.  Parallelizing Conditional Recurrences , 1996, Euro-Par, Vol. I.

[13]  Gary L. Miller,et al.  Tree-Based Parallel Algorithm Design , 1997, Algorithmica.

[14]  Akihiko Takano,et al.  Tupling calculation eliminates multiple data traversals , 1997, ICFP '97.

[15]  Masato Takeichi,et al.  MATHEMATICAL ENGINEERING TECHNICAL REPORTS Design and Implementation of General Tree Skeletons , 2005 .

[16]  Murray Cole,et al.  Parallel Programming with List Homomorphisms , 1995, Parallel Process. Lett..

[17]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[18]  Jeremy Gibbons Generic downwards accumulations , 2000, Sci. Comput. Program..

[19]  David B. Skillicorn,et al.  Foundations of parallel programming , 1995 .

[20]  Kevin Lü,et al.  Parallel processing XML documents , 2002, Proceedings International Database Engineering and Applications Symposium.

[21]  Wei-Ngan Chin,et al.  Deriving Parallel Codes via Invariants , 2000, SAS.

[22]  Afonso Ferreira,et al.  Efficient Parallel Graph Algorithms for Coarse-Grained Multicomputers and BSP , 2002, Algorithmica.

[23]  Richard Cole,et al.  The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time , 2005, Algorithmica.

[24]  Sergei Gorlatch,et al.  Systematic Efficient Parallelization of Scan and Other List Homomorphisms , 1996, Euro-Par, Vol. II.

[25]  Wojciech Rytter,et al.  An Optimal Parallel Algorithm for Dynamic Expression Evaluation and Its Applications , 1986, FSTTCS.

[26]  Wei-Ngan Chin,et al.  A Type-Based Approach to Parallelization , 2003 .

[27]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[28]  Masato Takeichi,et al.  Generation of Efficient Programs for Solving Maximum Multi-marking Problems , 2001, SAIG.

[29]  Richard S. Bird,et al.  Algebra of programming , 1997, Prentice Hall International series in computer science.

[30]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[31]  R. Bird Introduction to functional programming using Haskell, Second Edition , 1998 .

[32]  Herbert Kuchen,et al.  A Skeleton Library , 2002, Euro-Par.

[33]  Wentong Cai,et al.  Efficient Parallel Algorithms for Tree Accumulations , 1994, Sci. Comput. Program..

[34]  Ernst W. Mayr,et al.  Optimal Routing of Parentheses on the Hypercube , 1995, J. Parallel Distributed Comput..

[35]  Manuel M. T. Chakravarty,et al.  Flattening Trees , 1998, Euro-Par.

[36]  Mizuhito Ogawa,et al.  Make it practical: a generic linear-time algorithm for solving maximum-weightsum problems , 2000, ICFP '00.

[37]  Edson Cáceres,et al.  BSP/CGM Algorithms for Maximum Subsequence and Maximum Subarray , 2004, PVM/MPI.

[38]  John Hughes,et al.  Report on the Programming Language Haskell 98 , 1999 .

[39]  Stephen R. Tate,et al.  Dynamic parallel tree contraction (extended abstract) , 1994, SPAA '94.

[40]  David B. Skillicorn,et al.  The Bird-Meertens Formalism as a Parallel Model , 1993 .

[41]  Masato Takeichi,et al.  A Uniform Approach toward Nested Parallelism , 2004 .

[42]  Krzysztof Diks,et al.  More General Parallel Tree Contraction: Register Allocation and Broadcasting in a Tree , 1996, Theoretical Computer Science.

[43]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[44]  Jorg Striegnitz Making C++ Ready for Algorithmic Skeletons , 2000 .

[45]  Bruce M. Maggs,et al.  Communication-efficient parallel algorithms for distributed random-access machines , 1988, Algorithmica.

[46]  Amar Mukherjee,et al.  Efficient parallel evaluation of CSG tree using fixed number of processors , 1993, Solid Modeling and Applications.

[47]  Bernard Chazelle,et al.  A theorem on polygon cutting with applications , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[48]  Siau-Cheng Khoo,et al.  PType System: A Featherweight Parallelizability Detector , 2004, APLAS.

[49]  Narsingh Deo,et al.  Parallel Processing Letters C World Scientiic Publishing Company Parallel Algorithms for Maximum Subsequence and Maximum Subarray , 2022 .

[50]  Zhenjiang Hu,et al.  A Fusion-Embedded Skeleton Library , 2004, Euro-Par.

[51]  Wei-Ngan Chin,et al.  Parallelization via context preservation , 1998, Proceedings of the 1998 International Conference on Computer Languages (Cat. No.98CB36225).

[52]  David B. Skillicorn A Parallel Tree Difference Algorithm , 1996, Inf. Process. Lett..

[53]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language , 1992 .

[54]  David A. Bader,et al.  Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs) (Extended Abstract) , 2002, HiPC.

[55]  Sergei Gorlatch,et al.  A Transformational Framework for Skeletal Programs: Overview and Case Study , 1999, IPPS/SPDP Workshops.

[56]  Masato Takeichi,et al.  Formal Derivation of Parallel Program for 2-Dimensional Maximum Segment Sum Problem , 1996, Euro-Par, Vol. I.

[57]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[58]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[59]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[60]  Maarten M. Fokkinga Tupling and Mutumorphisms , 1989 .

[61]  Masato Takeichi,et al.  Parallel skeletons for manipulating general trees , 2006, Parallel Comput..

[62]  Björn Karlsson,et al.  Beyond the C++ Standard Library: An Introduction to Boost , 2005 .

[63]  Herbert Kuchen,et al.  Higher-order functions and partial applications for a C++ skeleton library , 2002, JGI '02.

[64]  Masato Takeichi,et al.  Systematic Derivation of Tree Contraction Algorithms , 2005, Parallel Process. Lett..

[65]  Gary L. Miller,et al.  Parallel Tree Contraction, Part 2: Further Applications , 1991, SIAM J. Comput..

[66]  Richard S. Bird,et al.  An introduction to the theory of lists , 1987 .

[67]  William M. Pottenger,et al.  The role of associativity and commutativity in the detection and transformation of loop-level parallelism , 1998, ICS '98.

[68]  George Horatiu Botorog,et al.  Skil: an imperative language with algorithmic skeletons for efficient distributed programming , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[69]  Jeremy Gibbons,et al.  Computing Downwards Accumulations on Trees Quickly , 1996, Theor. Comput. Sci..

[70]  Masato Takeichi,et al.  Construction of List Homomorphisms by Tupling and Fusion , 1996, MFCS.

[71]  Zhenjiang Hu,et al.  A library of constructive skeletons for sequential style of parallel programming , 2006, InfoScale '06.

[72]  Wei-Ngan Chin,et al.  Towards a Modular Program Derivation via Fusion and Tupling , 2002, GPCE.

[73]  Masato Takeichi,et al.  Implementation of Parallel Tree Skeletons on Distributed Systems , 2002, APLAS.

[74]  Salvatore Orlando,et al.  P3 L: A structured high-level parallel language, and its structured support , 1995, Concurr. Pract. Exp..

[75]  Wolf Pfannenstiel Piecewise execution of nested data-parallel programs , 2001 .

[76]  Taisook Han,et al.  Parallel Processing Letters, C World Scientiic Publishing Company an Analytical Method for Parallelization of Recursive Functions , 2022 .

[77]  Uzi Vishkin A no-busy-wait balanced tree parallel algorithmic paradigm , 2000, SPAA '00.

[78]  Philip Wadler,et al.  Deforestation: Transforming Programs to Eliminate Trees , 1988, Theoretical Computer Science.

[79]  Masato Takeichi,et al.  List Homomorphism with Accumulation , 2003, SNPD.

[80]  Kiminori Matsuzaki,et al.  Efficient Parallel Tree Reductions on Distributed Memory Environments , 2017, Scalable Comput. Pract. Exp..

[81]  John H. Reif,et al.  Synthesis of Parallel Algorithms , 1993 .

[82]  Johan Jeuring Theories for Algorithm Calculation , 1993 .

[83]  Allan L. Fisher,et al.  Parallelizing complex scans and reductions , 1994, PLDI '94.

[84]  Gary L. Miller,et al.  Dynamic parallel complexity of computational circuits , 1987, STOC '87.

[85]  Zhenjiang Hu,et al.  A New Parallel Skeleton for General Accumulative Computations , 2004, International Journal of Parallel Programming.

[86]  Sergei Gorlatch,et al.  TOWARDS PARALLEL PROGRAMMING BY TRANSFORMATION: THE FAN SKELETON FRAMEWORK , 2001, Parallel Algorithms Appl..

[87]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[88]  Rita Loogen,et al.  The Eden coordination model for distributed memory systems , 1997, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments.

[89]  Massimo Torquati,et al.  The Implementation of ASSIST, an Environment for Parallel and Distributed Programming , 2003, Euro-Par.

[90]  Kiminori Matsuzaki,et al.  Efficient Implementation of Tree Accumulations on Distributed-Memory Parallel Computers , 2007, International Conference on Computational Science.

[91]  Craig A. Tovey,et al.  Automatic generation of linear-time algorithms from predicate calculus descriptions of problems on recursively constructed graph families , 1992, Algorithmica.

[92]  Eugene L. Lawler,et al.  Linear-Time Computation of Optimal Subgraphs of Decomposable Graphs , 1987, J. Algorithms.

[93]  David B. Skillicorn Parallel Implementation of Tree Skeletons , 1996, J. Parallel Distributed Comput..

[94]  David B. Skillicorn Structured Parallel Computation in Structured Documents , 1995 .

[95]  Frank Harary,et al.  Graph Theory , 2016 .

[96]  Masato Takeichi,et al.  Formal derivation of efficient parallel programs by construction of list homomorphisms , 1997, TOPL.

[97]  S. Teng,et al.  Optimal Tree Contraction in the EREW Model , 1988 .

[98]  Masato Takeichi,et al.  Towards automatic parallelization of tree reductions in dynamic programming , 2006, SPAA '06.

[99]  Zhaofang Wen Fast Parallel Algorithms for the Maximum Sum Problem , 1995, Parallel Comput..

[100]  Zhenjiang Hu,et al.  MATHEMATICAL ENGINEERING TECHNICAL REPORTS Efficient Implementation of Tree Skeletons on Distributed-Memory Parallel Computers , 2006 .

[101]  Sergei Gorlatch,et al.  Optimization rules for programming with collective operations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[102]  Gary L. Miller,et al.  Parallel tree contraction and its application , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[103]  David G. Kirkpatrick,et al.  A Simple Parallel Tree Contraction Algorithm , 1989, J. Algorithms.

[104]  Jens Gustedt Communication and Memory Optimized Tree Contraction and List Ranking , 2000 .

[105]  Masato Takeichi,et al.  An Accumulative Parallel Skeleton for All , 2002, APLAS.

[106]  Keith H. Randall,et al.  Cilk: efficient multithreaded computing , 1998 .

[107]  Sergei Gorlatch,et al.  Patterns and Skeletons for Parallel and Distributed Computing , 2002, Springer London.

[108]  P. Dew,et al.  A Skeleton for Parallel Csg with a Performance Model , 1997 .

[109]  Simon L. Peyton Jones,et al.  A short cut to deforestation , 1993, FPCA '93.

[110]  Stephen Gilmore,et al.  Flexible Skeletal Programming with eSkel , 2005, Euro-Par.

[111]  Bo Lu,et al.  Compiler optimization of implicit reductions for distributed memory multiprocessors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[112]  Maarten M. Fokkinga,et al.  Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire , 1991, FPCA.