Parallel Computer Algebra 1

2 Building and implementing parallel algorithms in the area of computer algebra has become an important thread of research for more than a decade with the increasing availability of various parallel architectures, from dedicated machines to network of workstations. New algorithms have been built and implemented to solve high performance computing challenges. The aim of this tutorial is to give an introduction to parallel algorithms in computer algebra, from the building of an efficient algorithm to its effective implementation on a given architecture. Parallel computer algebra systems, that exploit the parallelism of an algorithm on a given architecture, play a central role to ensure efficient executions. Due to the variety of parallel programming models, several such systems propose various approaches to express parallelism, from data distribution to functional parallelism. After an introduction to algorithmic techniques and classical programming models, the tuto-rial will focus on parallel computer algebra systems, parallel linear algebra algorithms and their effective implementations. The tutorial is organized in four parts : 1. Parallel efficient algorithms. The major techniques used to build efficient algorithms on theoretical machine models are presented. They are illustrated by various basic computer algebra algorithms. Due to the non-uniformity of memory access, communication complexity is a key point to take into account in the analysis of the algorithm. 2. Programming models and scheduling. To combine expressive power and portability, several programming models have been proposed, from message-passing to bulk-synchronous programming and functional languages. The inherent overhead due to their emulation makes each of them suited to a specific range of applications. 3. Parallel computer algebra systems. Different parallel systems are proposed that are based on the coupling of a sequential system and a parallel programming model. They are often guided by the classes of applications on which they have been experimented. 4. Parallel linear algebra. The parallelization techniques introduced before are illustrated on various research problems in parallel linear algebra : system solving, gcd, rank and normal forms.

[1]  Gilles Villard Calcul formel et parallélisme : résolution de systèmes linéaires. (Parallel algebraic computation. Solution of linear systems) , 1988 .

[2]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[3]  Wolfgang Rosenstiel,et al.  Distributed Symbolic Computation with DTS , 1995, IRREGULAR.

[4]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[5]  Leslie G. Valiant,et al.  General Purpose Parallel Architectures , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[6]  Kurt Siegl A Parallel Factorization Tree Gr?bner Basis Algorithm , 1994 .

[7]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[8]  Jack J. Dongarra,et al.  A message passing standard for MPP and workstations , 1996, CACM.

[9]  V. Pan,et al.  Polynomial and Matrix Computations , 1994, Progress in Theoretical Computer Science.

[10]  Gary L. Miller,et al.  Sublinear Parallel Algorithm for Computing the Greatest Common Divisor of Two Integers , 1984, SIAM J. Comput..

[11]  Gene Cooperman,et al.  STAR/MPI: binding a parallel library to interactive symbolic algebra systems , 1995, ISSAC '95.

[12]  Sanjay Ranka,et al.  A practical hierarchical model of parallel computation , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[13]  Franco P. Preparata,et al.  An Improved Parallel Processor Bound in Fast Matrix Inversion , 1978, Inf. Process. Lett..

[14]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[15]  Erich Kaltofen,et al.  FOXBOX: a system for manipulating symbolic objects in black box representation , 1998, ISSAC '98.

[16]  A. L. Rosenberg,et al.  Parallel Architectures and Their Efficient Use , 1993, Lecture Notes in Computer Science.

[17]  VishkinUzi,et al.  Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories , 1984 .

[18]  Christos H. Papadimitriou,et al.  A Communication-Time Tradeoff , 1987, SIAM J. Comput..

[19]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[20]  Erich Kaltofen,et al.  A Distributed Approach to Problem Solving in Maple , 1994 .

[21]  Ian Parberry,et al.  Parallel complexity theory , 1987, Research notes in theoretical computer science.

[22]  Z Liu,et al.  Scheduling Theory and its Applications , 1997 .

[23]  Jean-Louis Roch Calcul formel et parallélisme : l'architecture du système PAC et son arithmétique rationnelle. (Computer algebra and parallelism: pac system architecture and rationnal arithmetic) , 1989 .

[24]  Wolfgang Kuchlin Parsac-2: Parallel Computer Algebra On The Desk-Top , 1995 .

[25]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[26]  Abhiram G. Ranade,et al.  A Framework for Analyzing Locality and Portability Issues in Parallel Computing , 1992, Heinz Nixdorf Symposium.

[27]  Thomas L. Casavant,et al.  A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems , 1988, IEEE Trans. Software Eng..

[28]  Wolfgang Küchlin,et al.  PARSAC-2: A Parallel SAC-2 Based on Threads , 1990, AAECC.

[29]  José L. Balcázar,et al.  Structural Complexity II , 2012, EATCS.

[30]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[31]  Victor Y. Pan,et al.  Work-Preserving Speed-Up of Parallel Matrix Computations , 1995, SIAM J. Comput..

[32]  Ian Foster,et al.  Strand: New Concepts in Parallel Programming , 1990 .

[33]  H. James Hoover,et al.  Bounding Fan-out in Logical Networks , 1984, JACM.

[34]  Gilles Villard,et al.  PAC: first experiments on a 128 transputers méganode , 1991, ISSAC '91.

[35]  Guy E. Blelloch,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1995, SPAA '95.

[36]  Paul S. Wang,et al.  Parallel univariate polynomial factorization on shared-memory multiprocessors , 1990, ISSAC '90.

[37]  E BlellochGuy,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1999 .

[38]  Paul S. Wang,et al.  Parallel Polynomial Operations on SMPs: an Overview , 1996, J. Symb. Comput..

[39]  Jack Dongarra,et al.  PVMPI: An Integration of the PVM and MPI Systems , 1996 .

[40]  Stephen M. Watt,et al.  A first report on the A# compiler , 1994, ISSAC '94.

[41]  David P. Williamson,et al.  Scheduling parallel machines on-line , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[42]  Allan Borodin,et al.  The computational complexity of algebraic and numeric problems , 1975, Elsevier computer science library.

[43]  Alexander L. Chistov,et al.  Fast parallel calculation of the rank of matrices over a field of arbitrary characteristic , 1985, FCT.

[44]  Giovanni Cesari,et al.  CALYPSO: a computer algebra library for parallel symbolic computation , 1997, PASCO '97.

[45]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[46]  Peter A. Buhr,et al.  The μsystem: Providing light‐weight concurrency on shared‐memory multiprocessor computers running UNIX , 1990, Softw. Pract. Exp..

[47]  Larry Rudolph,et al.  A Complexity Theory of Efficient Parallel Algorithms , 1990, Theor. Comput. Sci..

[48]  Christopher F. Joerg,et al.  The Cilk system for parallel multithreaded computing , 1996 .

[49]  Nathalie Revol,et al.  Parallel Evaluation of Arithmetic Circuits , 1996, Theor. Comput. Sci..

[50]  Friedhelm Meyer auf der Heide,et al.  Efficient PRAM simulation on a distributed memory machine , 1992, STOC '92.

[51]  Gilles Villard,et al.  Regular versus Irregular Problems and Algorithms , 1995, IRREGULAR.

[52]  Mark Giesbrecht Fast algorithms for matrix normal forms , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[53]  Gilles Villard,et al.  Fast Parallel Algorithms for Matrix Reduction to Normal Forms , 1997, Applicable Algebra in Engineering, Communication and Computing.

[54]  B. David Saunders,et al.  A parallel implementation of the cylindrical algebraic decomposition algorithm , 1989, ISSAC '89.

[55]  Bogdan Dumitrescu,et al.  Fast Matrix Multiplication Algorithms on Mimd Architectures , 1994, Parallel Algorithms Appl..

[56]  Paul S. Wang,et al.  Tools for parallel/distributed mathematical computation , 1997, PASCO '97.

[57]  Bruce W. Char,et al.  Maple V Language Reference Manual , 1993, Springer US.

[58]  L. R. Kerr The Effect of Algebraic Structure on the Computational Complexity of Matrix Multiplication , 1970 .

[59]  Melvin E. Conway,et al.  Design of a separable transition-diagram compiler , 1963, CACM.

[60]  M. Luby Removing randomness in parallel computation without a processor penalty , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[61]  Jack Dongarra,et al.  Pvm: A Users' Guide and Tutorial for Network Parallel Computing , 1994 .

[62]  Andreas Strotmann,et al.  Objectives of Openmath , 1995 .

[63]  Victor Y. Pan,et al.  Processor efficient parallel solution of linear systems over an abstract field , 1991, SPAA '91.

[64]  J. Marti,et al.  Compilation techniques for a control-flow concurrent LISP system , 1980, LISP Conference.

[65]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[66]  Marc Snir Scalable Parallel Computers and Scalable Parallel Codes: From Theory to Practice , 1992, Heinz Nixdorf Symposium.

[67]  David S. Johnson,et al.  A Catalog of Complexity Classes , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[68]  David P. Williamson,et al.  Scheduling Parallel Machines On-Line , 1995, SIAM J. Comput..

[69]  Jacques Chassin de Kergommeaux,et al.  Parallel logic programming systems , 1994, CSUR.

[70]  Françoise Roch Calcul formel et parallélisme : forme normale d'Hermite, méthodes de calcul et parallélisation. (Computer algebra and parallelism. Hermite normal form: computation and parallelization) , 1990 .

[71]  Wolfgang Schreiner,et al.  A Para-Functional Programming Interface for a Parallel Computer Algebra Package , 1996, J. Symb. Comput..

[72]  Thierry Gautier Calcul formel et parallélisme : conception du système GIVARO et applications au calcul dans les extensions algébriques , 1996 .

[73]  Joachim von zur Gathen,et al.  Parallel Arithmetic Computations: A Survey , 1986, MFCS.

[74]  S. Watt Bounded parallelism in computer algebra , 1986 .

[75]  Erich Kaltofen,et al.  Parallel algorithms for matrix normal forms , 1990 .

[76]  Edward G. Coffman,et al.  A generalized bound on LPT sequencing , 1976, SIGMETRICS '76.

[77]  Richard M. Karp,et al.  An introduction to randomized algorithms , 1991, Discret. Appl. Math..

[78]  Todd C. Torgersen Distributing symbolic computations on a network of workstations , 1994 .

[79]  Erich Kaltofen,et al.  On the complexity of finding short vectors in integer lattices , 1983, EUROCAL.

[80]  Jacques Briat,et al.  Athapascan Runtime: Efficiency for Irregular Problems , 1997, Euro-Par.

[81]  Ronald L. Graham,et al.  Performance Guarantees for Scheduling Algorithms , 1978, Oper. Res..

[82]  Wolfgang Küchlin,et al.  A case study of multi-threaded Gröbner basis completion , 1996, ISSAC '96.

[83]  T. Valente A distributed approach to proving large numbers prime , 1992 .

[84]  Oscar H. Ibarra,et al.  A Note on the Parallel Complexity of Computing the Rank of Order n Matrices , 1980, Inf. Process. Lett..

[85]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[86]  Richard Cole,et al.  Approximate Parallel Scheduling. Part I: The Basic Technique with Applications to Optimal Parallel List Ranking in Logarithmic Time , 1988, SIAM J. Comput..

[87]  Mark Giesbrecht,et al.  Nearly Optimal Algorithms for Canonical Matrix Forms , 1995, SIAM J. Comput..

[88]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[89]  David B. Shmoys,et al.  Scheduling to Minimize Average Completion Time: Off-Line and On-Line Approximation Algorithms , 1997, Math. Oper. Res..

[90]  Drexel UniversityPhiladelphia,et al.  Some Experiments with Parallel Bignum Arithmeticy , 1994 .

[91]  David R. Butenhof Programming with POSIX threads , 1993 .

[92]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[93]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[94]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[95]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[96]  Erich Kaltofen,et al.  Process Scheduling in DSC and the Large Sparse Linear Systems Challenge , 1993, DISCO.

[97]  Robert D. Blumofe,et al.  Executing multithreaded programs efficiently , 1995 .

[98]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[99]  Tzong-Jer Yang,et al.  A comparison of clustering heuristics for scheduling dags on multiprocessors , 1990 .

[100]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[101]  John ffitch Can REDUCE be Run in Parallel? , 1989, ISSAC.

[102]  Jacek Blazewicz,et al.  Scheduling in Computer and Manufacturing Systems , 1990 .

[103]  Wolfgang Küchlin,et al.  The S-Threads Environment for Parallel Symbolic Computation , 1990, CAP.

[104]  Jacob T. Schwartz,et al.  Fast Probabilistic Algorithms for Verification of Polynomial Identities , 1980, J. ACM.

[105]  Hoon Hong,et al.  Interface to the STURM Distributed Multi-Processor Kernel , 1994 .

[106]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[107]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[108]  David H. Bailey,et al.  Extra high speed matrix multiplication on the Cray-2 , 1988 .

[109]  Pascale Sénéchaud Calcul formel et parallélisme : bases de Gröbner booléennes, méthodes de calcul : applications, parallélisation , 1990 .

[110]  Joachim von zur Gathen Parallel algorithms for algebraic problems , 1983, STOC '83.

[111]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[112]  Allan Borodin,et al.  Fast parallel matrix and GCD computations , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[113]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[114]  Torsten Suel,et al.  BSPlib: The BSP programming library , 1998, Parallel Comput..

[115]  Erich Kaltofen,et al.  DSC: a system for distributed symbolic computation , 1991, ISSAC '91.

[116]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[117]  Christian Heckler,et al.  Progress Report on Parallelism in MuPAD , 1997 .

[118]  Wolfgang Schreiner Virtual Tasks for the PACLIB Kernel , 1994, CONPAR.

[119]  Rüdiger G. K. Loos,et al.  The algorithm description language ALDES (report) , 1976, SIGS.

[120]  K. A. Gallivan,et al.  Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[121]  Laurent Bernardin Maple on a massively parallel, distributed memory machine , 1997, PASCO '97.

[122]  Gilles Villard,et al.  Fast Parallel Computation of the Jordan Normal Form of Matrices , 1996, Parallel Process. Lett..

[123]  Victor J. Rayward-Smith,et al.  UET scheduling with unit interprocessor communication delays , 1987, Discret. Appl. Math..

[124]  Austin A. Lobo Matrix-free linear system solving and applications to symbolic computation , 1996 .

[125]  Jr. Robert H. Halsatead Parallel computing using Multilisp , 1988 .

[126]  Gilles Villard,et al.  A New Load-Prediction Scheme Based on Algorithmic Cost Functions , 1994, CONPAR.

[127]  Wolfgang Küchlin,et al.  On the multi-threaded computation of integral polynomial greatest common divisors , 1991, ISSAC '91.

[128]  Kurt Siegl,et al.  Parallelizing algorithms for symbolic computation using MAPLE , 1993, PPOPP '93.

[129]  Martin Rinard,et al.  The design, implementation and evaluation of Jade: a portable, implicitly parallel programming language , 1994 .

[130]  Paul S. Wang Parallel univariate p-adic lifting on shared-memory multiprocessors , 1992, ISSAC '92.

[131]  Christian Heckler,et al.  Parallelism in MuPAD , 1997, SIGS.

[132]  Samuel T. Chanson,et al.  Performance Models for the Processor Farm Paradigm , 1997, IEEE Trans. Parallel Distributed Syst..

[133]  Gilles Villard,et al.  Computer algebra on MIMD machine , 1988, SIGS.

[134]  Robert H. Halstead,et al.  Parallel Symbolic Computing , 1986, Computer.

[135]  L. Csanky,et al.  Fast parallel matrix inversion algorithms , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[136]  Wolfgang Küchlin,et al.  Parallel Computer Algebra on the Desk-Top , 1995 .

[137]  Stephen A. Cook,et al.  A Taxonomy of Problems with Fast Parallel Algorithms , 1985, Inf. Control..

[138]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[139]  Ketan Mulmuley,et al.  A fast parallel algorithm to compute the rank of a matrix over an arbitrary field , 1986, STOC '86.

[140]  Ian T. Foster,et al.  The Nexus Approach to Integrating Multithreading and Communication , 1996, J. Parallel Distributed Comput..

[141]  Hoon Hong,et al.  The Design of the SACLIB/PACLIB Kernels , 1993, DISCO.

[142]  Gilles Villard,et al.  Cost Prediction for Load Balancing: Application to Algebraic Computations , 1992, CONPAR.

[143]  Gary L. Miller,et al.  Efficient Parallel Evaluation of Straight-Line Code and Arithmetic Circuits , 1988, SIAM J. Comput..

[144]  V. Strassen Gaussian elimination is not optimal , 1969 .

[145]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[146]  Abhiram G. Ranade,et al.  How to emulate shared memory , 1991, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[147]  Steffen Seitz Algebraic Computing on a Local Net , 1990, CAP.

[148]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[149]  David B. Shmoys,et al.  Using Dual Approximation Algorithms for Scheduling Problems: Theoretical and Practical Results , 1985, FOCS.

[150]  Anthony P. Reeves,et al.  Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[151]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[152]  Carl Glen Ponder Evaluation of performance enhancements in algebraic manipulation systems , 1988 .

[153]  Bruce W. Char Progress report on a system for general-purpose parallel symbolic algebraic computation , 1990, ISSAC '90.