Parallel systems in symbolic and algebraic computation

This thesis describes techniques that exploit the distributed memory in massively parallel processors to satisfy the peak memory requirements of some very large computer algebra problems. Our aim is to achieve balanced memory use, which differentiates this work from other parallel systems whose focus is on gaining speedup. It is widely observed that failures in computer algebra systems are mostly due to memory overload: for several problems in computer algebra, some of the best available algorithms suffer from intermediate expression swell where the result is of reasonable size, but the intermediate calculation encounters severe memory limitations. This observation motivates our memory-centric approach to parallelizing computer algebra algorithms. The memory balancing is based on a randomized hashing algorithm for dynamic distribution of data. Dynamic distribution means that the intermediate data is allocated storage space at the time that it is created and therefore the system can avoid overloading some processing elements. Large scale computer algebra problems with peak memory demands of more than 10 gigabytes are considered. Distributed memory can scale to satisfy these requirements. For example, the Hitachi SR2201 which is the target architecture in this research provides up to 56 gigabytes of memory. The system has fine granularity : tasks sizes are small and data is partitioned in small blocks. The fine granularity provides flexibility in controlling memory balance but incurs higher communication costs. The communication overhead is reduced by an intelligent scheduler which performs asynchronous overlap of communication and computation. The implementation provides a polynomial algebra system with operations on multivariate polynomials and matrices with polynomial entries. Within this framework it is possible to find computations with large memory demands, for example, solving large sparse systems of linear equations and Gröbner base computations. The parallel algorithms that have been implemented are based on the standard algorithms for polynomial algebra. This demonstrates that careful attention to memory management aids solution of very large problems even without the benefit of advanced algorithms. The parallel implementation can be used to solve larger problems than have previously been possible.

[1]  William S. Brown,et al.  The alpak system for nonnumerical algebra on a digital computer , 1963 .

[2]  Gary J. Nutt,et al.  Operating systems - a modern perspective , 1997 .

[3]  Mantsika Matooane,et al.  A Parallel Symbolic Computation Environment: Structures and Mechanics , 1999, Euro-Par.

[4]  Wolfgang Küchlin,et al.  PARSAC-2: A Parallel SAC-2 Based on Threads , 1990, AAECC.

[5]  Eugene V. Zima Mixed representation of polynomials oriented towards fast parallel shift , 1997, PASCO.

[6]  J. Hyde,et al.  The Alpak system for nonnumerical algebra on a digital computer — II: Rational functions of several variables and truncated power series with rational-function coefficients , 1964 .

[7]  Paul Jackson,et al.  Finite Field Arithmetic Using the Connection Machine , 1990, CAP.

[8]  Erich Kaltofen,et al.  Fast Parallel Absolute Irreducibility Testing , 1985, J. Symb. Comput..

[9]  Erich Kaltofen,et al.  Challenges of Symbolic Computation: My Favorite Open Problems , 2000, J. Symb. Comput..

[10]  D. Coppersmith Solving homogeneous linear equations over GF (2) via block Wiedemann algorithm , 1994 .

[11]  T. Valente A distributed approach to proving large numbers prime , 1992 .

[12]  Robert M. Corless,et al.  Two Perturbation Calculations in Fluid Mechanics Using Large-Expression Management , 1997, J. Symb. Comput..

[13]  Hoon Hong,et al.  The Design of the SACLIB/PACLIB Kernels , 1993, DISCO.

[14]  Afonso Ferreira,et al.  A Polynomial-Time Branching Procedure for the Multiprocessor Scheduling Problem , 1999, Euro-Par.

[15]  SkjellumAnthony,et al.  A high-performance, portable implementation of the MPI message passing interface standard , 1996 .

[16]  William Gropp,et al.  Users guide for mpich, a portable implementation of MPI , 1996 .

[17]  Patrizia M. Gianni,et al.  Gröbner Bases and Primary Decomposition of Polynomial Ideals , 1988, J. Symb. Comput..

[18]  Andrew Davison,et al.  Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers , 1995 .

[19]  Andrew S. Tanenbaum,et al.  Operating systems: design and implementation , 1987, Prentice-Hall software series.

[20]  Gary L. Miller,et al.  Sublinear Parallel Algorithm for Computing the Greatest Common Divisor of Two Integers , 1984, FOCS.

[21]  Christos H. Papadimitriou,et al.  A Communication-Time Tradeoff , 1987, SIAM J. Comput..

[22]  Henri Casanova,et al.  Adaptive Scheduling for Task Farming with Grid Middleware , 1999, Int. J. High Perform. Comput. Appl..

[23]  Erich Kaltofen,et al.  Process Scheduling in DSC and the Large Sparse Linear Systems Challenge , 1993, DISCO.

[24]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[25]  Erich Kaltofen,et al.  DSC: a system for distributed symbolic computation , 1991, ISSAC '91.

[26]  Bruno Buchberger The Parallel L-Machine for Symbolic Computation , 1985, European Conference on Computer Algebra.

[27]  Vipul Gupta,et al.  Performance analysis of a synchronous, circuit-switched interconnection cached network , 1994, ICS '94.

[28]  David H. Bailey,et al.  Twelve ways to fool the masses when giving performance results on parallel computers , 1991 .

[29]  Erich Kaltofen,et al.  Distributed Matrix-Free Solution of Large Sparse Linear Systems over Finite Fields , 1999, Algorithmica.

[30]  Stephen M. Watt A System for Parallel Computer Algebra Programs , 1985, European Conference on Computer Algebra.

[31]  Laurent Bernardin On bivariate Hensel lifting and its parallelization , 1997 .

[32]  Erich Kaltofen,et al.  On computing determinants of matrices without divisions , 1992, ISSAC '92.

[33]  Babak Hamidzadeh,et al.  Dynamic Task Scheduling Using Online Optimization , 2000, IEEE Trans. Parallel Distributed Syst..

[34]  George E. Collins,et al.  The SAC-2 Computer Algebra System , 1985, European Conference on Computer Algebra.

[35]  A. Avizeinis,et al.  Signed Digit Number Representations for Fast Parallel Arithmetic , 1961 .

[36]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[37]  E. Bareiss Sylvester’s identity and multistep integer-preserving Gaussian elimination , 1968 .

[38]  Peter A. Dinda,et al.  The measured network traffic of compiler-parallelized programs , 2001, International Conference on Parallel Processing, 2001..

[39]  Victor Y. Pan,et al.  Parallel Evaluation of the Determinant and of the Inverse of a Matrix , 1989, Inf. Process. Lett..

[40]  Steffen Seitz Algebraic Computing on a Local Net , 1990, CAP.

[41]  W. S. Brown On Euclid's algorithm and the computation of polynomial greatest common divisors , 1971, SYMSAC '71.

[42]  Carl Glen Ponder Evaluation of performance enhancements in algebraic manipulation systems , 1988 .

[43]  Wolfgang Küchlin,et al.  The S-Threads Environment for Parallel Symbolic Computation , 1990, CAP.

[44]  Eyal Kushilevitz,et al.  An Omega(D log (N/D)) Lower Bound for Broadcast in Radio Networks , 1998, SIAM J. Comput..

[45]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[46]  Derek L. Eager,et al.  The interaction between virtual channel flow control and adaptive routing in wormhole networks , 1994, ICS '94.

[47]  Tse-Yun Feng,et al.  On a Class of Multistage Interconnection Networks , 1980, IEEE Transactions on Computers.

[48]  Wolfgang Küchlin,et al.  Integer Multiplication in PARSAC-2 on Stock Microprocessors , 1991, AAECC.

[49]  Algirdas Avizienis,et al.  Signed-Digit Numbe Representations for Fast Parallel Arithmetic , 1961, IRE Trans. Electron. Comput..

[50]  Roberto Pirastu,et al.  Parallel Computation and Indefinite Summation: A MAPLE Application for the Rational Case , 1995, J. Symb. Comput..

[51]  Dorothea A. Klip New Algorithms for Polynomial Multiplication , 1979, SIAM J. Comput..

[52]  John ffitch,et al.  CABAL: polynomial and power series algebra on a parallel computer , 1997, PASCO '97.

[53]  Bruce W. Char,et al.  GCDHEU: Heuristic Polynomial GCD Algorithm Based On Integer GCD Computation , 1984, J. Symb. Comput..

[54]  Menouer Diab Systolic Architectures for Multiplication over Finite Field GF(2m) , 1990, AAECC.

[55]  Peter Thiemann,et al.  Distributed partial evaluation , 1997, PASCO '97.

[56]  Yasumasa Kanada,et al.  Parallelism in algebraic computation and parallel algorithms for symbolic linear systems , 1981, SYMSAC '81.

[57]  Hans Schönemann,et al.  Monomial representations for Gröbner bases computations , 1998, ISSAC '98.

[58]  H. Heinrich,et al.  E. Kreyszig, Advanced Engineering Mathematics. IX + 856 S. m. 402 Abb. New York/London 1963. John Wiley and Sons, Inc. Preis geb. 79/‐ , 1964 .

[59]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[60]  Yasuhiro Inagami,et al.  Deadlock-free fault-tolerant routing in the multi-dimensional crossbar network and its implementation for the Hitachi SR2201 , 1997, Proceedings 11th International Parallel Processing Symposium.

[61]  Richard P. Brent,et al.  Some Parallel Algorithms for Integer Factorisation , 1999, Euro-Par.

[62]  John A. Sharp A brief introduction to data flow , 1992 .

[63]  George E. Collins,et al.  Subresultants and Reduced Polynomial Remainder Sequences , 1967, JACM.

[64]  Victor Y. Pan,et al.  Processor efficient parallel solution of linear systems over an abstract field , 1991, SPAA '91.

[65]  Yuzo Takamatsu,et al.  Exponetiation in Finite Fields Using Dual Basis Multiplier , 1990, AAECC.

[66]  Erich Kaltofen,et al.  FOXBOX: a system for manipulating symbolic objects in black box representation , 1998, ISSAC '98.

[67]  Laurent Bernardin Maple on a massively parallel, distributed memory machine , 1997, PASCO '97.

[68]  Michael T. McClellan,et al.  The Exact Solution of Systems of Linear Equations with Polynomial Coefficients , 1973, JACM.

[69]  Yoshiko Yasuda,et al.  Architecture and performance of the Hitachi SR2201 massively parallel processor system , 1997, Proceedings 11th International Parallel Processing Symposium.

[70]  Gene Cooperman,et al.  STAR/MPI: binding a parallel library to interactive symbolic algebra systems , 1995, ISSAC '95.

[71]  Barry S. Fagin Fast Addition of Large Integers , 1992, IEEE Trans. Computers.

[72]  Thomas Fahringer,et al.  A Uniied Symbolic Evaluation Framework for Parallelizing Compilers , 1999 .

[73]  Alyson Reeves A Parallel Implementation of Buchberger's Algorithm over Zp for p <= 31991 , 1998, J. Symb. Comput..

[74]  Thomas L. Sterling,et al.  A Coming of Age for Beowulf-Class Computing , 1999, Euro-Par.

[75]  Bob Francis,et al.  Silicon Graphics Inc. , 1993 .

[76]  G. A. Geist PVM 3 beyond network computing , 1993 .

[77]  Jean-Louis Roch An Environment for Parallel Algebraic Computation , 1990, CAP.

[78]  Robert H. Halstead,et al.  Parallel Symbolic Computing , 1986, Computer.

[79]  J. Smit,et al.  A cancellation free algorithm, with factoring capabilities, for the efficient solution of large sparse sets of equations , 1981, SYMSAC '81.

[80]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[81]  Joseph F. Traub,et al.  On Euclid's Algorithm and the Theory of Subresultants , 1971, JACM.

[82]  Tateaki Sasaki,et al.  Efficient Gaussian Elimination Method for Symbolic Determinants and Linear Systems , 1982, TOMS.

[83]  Wolfgang Küchlin,et al.  A case study of multi-threaded Gröbner basis completion , 1996, ISSAC '96.

[84]  Giovanni Cesari,et al.  CALYPSO: a computer algebra library for parallel symbolic computation , 1997, PASCO '97.

[85]  Winfried Neun,et al.  Implementation of the LISP-Arbitrary Precision Arithmetic for a Vector Processor. , 1988 .

[86]  Hans Schönemann,et al.  MPP: a framework for distributed polynomial computations , 1996, ISSAC '96.

[87]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[88]  Thomas Sturm,et al.  Approaches to parallel quantifier elimination , 1998, ISSAC '98.

[89]  B. F. Caviness,et al.  Future Directions for Research in Symbolic Computation , 1990 .

[90]  B. F. Caviness,et al.  Computer Algebra: Past and Future , 1985, J. Symb. Comput..

[91]  Jean-Louis Roch,et al.  Parallel computer algebra (tutorial) , 1997, ISSAC 1997.

[92]  V ZimaEugene Mixed representation of polynomials oriented towards fast parallel shift , 1997 .

[93]  Eyal Kushilevitz,et al.  An Ω(D log(N/D)) lower bound for broadcast in radio networks , 1993, PODC '93.

[94]  Geoffrey C. Fox,et al.  Scheduling regular and irregular communication patterns on the CM-5 , 1992, Proceedings Supercomputing '92.

[95]  J. D. Lipson Elements of algebra and algebraic computing , 1981 .

[96]  Douglas H. Wiedemann Solving sparse linear equations over finite fields , 1986, IEEE Trans. Inf. Theory.

[97]  Carlo Traverso,et al.  “One sugar cube, please” or selection strategies in the Buchberger algorithm , 1991, ISSAC '91.

[98]  Paul Feautrier,et al.  Scheduling reductions , 1994, ICS '94.

[99]  Katherine A. Yelick,et al.  Implementing an irregular application on a distributed memory multiprocessor , 1993, PPOPP '93.

[100]  Jack J. Dongarra,et al.  A message passing standard for MPP and workstations , 1996, CACM.

[101]  Philip K. McKinley,et al.  A dominating set model for broadcast in all-port wormhole-routed 2D mesh networks , 1994, ICS '94.

[102]  Robert G. Tobey Experience with FORMAC algorithm design , 1966, CACM.

[103]  Tudor Jebelean Integer and Rational Arithmetic on MasPar , 1996, DISCO.

[104]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[105]  P. L. Montgomery,et al.  An FFT extension of the elliptic curve method of factorization , 1992 .

[106]  Bruce W. Char Progress report on a system for general-purpose parallel symbolic algebraic computation , 1990, ISSAC '90.

[107]  John ffitch,et al.  The Bath concurrent LISP machine , 1983, EUROCAL.

[108]  Per-Åke Larson,et al.  File organization: implementation of a method guaranteeing retrieval in one access , 1984, CACM.

[109]  Arjan J. C. van Gemund,et al.  The importance of synchronization structure in parallel program optimization , 1997, ICS '97.

[110]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[111]  György E. Révész Lambda-calculus, combinators, and functional programming , 1988, Cambridge tracts in theoretical computer science.

[112]  John ffitch Can REDUCE be Run in Parallel? , 1989, ISSAC.

[113]  Jack Dongarra,et al.  Pvm: A Users' Guide and Tutorial for Network Parallel Computing , 1994 .

[114]  Katherine A. Yelick,et al.  Portable Parallel Irregular Applications , 1995, PSLS.

[115]  Ellis Horowitz,et al.  On Computing the Exact Determinant of Matrices with Polynomial Entries , 1975, JACM.

[116]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[117]  Stuart J. Berkowitz,et al.  On Computing the Determinant in Small Parallel Time Using a Small Number of Processors , 1984, Inf. Process. Lett..