Making Static Pivoting Scalable and Dependable

Solving square linear systems of equations Ax = b is one of the primary workhorses in scientific computing. With asymptotically and practically small amounts of extra calculation and higher precision, we can render solution techniques dependable. We produce a solution with tiny error for almost all systems where we should expect a tiny error, and we correctly flag potential failures. Our method uses a proven technique: iterative refinement. We extend prior work by applying extra precision not only in calculating the residual b – Ayi of an intermediate solution y i but also in carrying that intermediate solution yi. Analysis shows that extra precision in the intermediate solutions lowers the limiting backward error (measuring perturbations in the initial problem) to levels that produce a forward error (measuring perturbations in the solution) not much larger than the precision used to store the result. We also demonstrate that condition estimation is not necessary for determining success, reducing the computation in refinement substantially. This basic, dependable solver applies to typical dense LU factorization methods using partial pivoting as well as methods that risk greater failure by choosing pivots for nonnumerical reasons. Sparse factorization methods may choose pivots to promote structural sparsity or even choose pivots before factorization to decouple the phases. We show through experiments that solutions using these restrictive pivoting methods still have small error so long as an estimate of factorization quality, the growth factor, does not grow too large. Our refinement algorithm dependably flags such failures. Additionally, we find a better choice of heuristic for sparse static pivoting than the defaults in Li and Demmel's SuperLU package. Static pivoting in a distributed-memory setting needs an algorithm for choosing pivots that does not rely on fitting the entire matrix into one memory space. We investigate a set of algorithms, Bertsekas's auction algorithms, for choosing a static pivoting via maximum weight perfect bipartite matching. Auction algorithms have a natural mapping to distributed memory computation through their bidding mechanism. We provide an analysis of the auction algorithm fitting it comfortably in linear optimization theory and characterizing approximately maximum weight perfect bipartite matches. These approximately maximum weight perfect matches work well as static pivot choices and can be computed much more quickly than the exact maximum weight matching. Finally, we consider the performance of auction algorithm implementations on a suite of real-world sparse problems. Sequential performance is roughly equivalent to existing implementations like Duff and Koster's MC64, but varies widely with different parameter and input settings. The parallel performance is even more wildly unpredictable. Computing approximately maximum weight matchings helps performance somewhat, but we still conclude that the performance is too variable for a black-box solution method.

[1]  G. Stewart Introduction to matrix computations , 1973 .

[2]  Patrick Amestoy,et al.  MUMPS : A General Purpose Distributed Memory Sparse Solver , 2000, PARA.

[3]  I. Duff,et al.  Towards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems , 2005 .

[4]  H. Markowitz The Elimination form of the Inverse and its Application to Linear Programming , 1957 .

[5]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[6]  Vijay Saraswat,et al.  Report on the Programming Language X10 , 2010 .

[7]  Md. Mostofa Ali Patwary,et al.  Parallel algorithms for bipartite matching problems on distributed memory computers , 2011, Parallel Comput..

[8]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[9]  Rainer E. Burkard,et al.  Linear Assignment Problems and Extensions , 1999, Handbook of Combinatorial Optimization.

[10]  N. Higham Iterative refinement for linear systems and LAPACK , 1997 .

[11]  Edmond Chow,et al.  Combinatorial Algorithms for Computing Column Space Bases That Have Sparse Inverses , 2005 .

[12]  H. Wozniakowski,et al.  Iterative refinement implies numerical stability , 1977 .

[13]  R. Skeel Iterative refinement implies numerical stability for Gaussian elimination , 1980 .

[14]  John von Neumann,et al.  1. A Certain Zero-sum Two-person Game Equivalent to the Optimal Assignment Problem , 1953 .

[15]  G. vanRossum Python reference manual , 1995 .

[16]  Philip E. Gill,et al.  Practical optimization , 1981 .

[17]  Richard J. Anderson,et al.  On the parallel implementation of Goldberg's maximum flow algorithm , 1992, SPAA '92.

[18]  Dimitri P. Bertsekas,et al.  Auction algorithms for network flow problems: A tutorial introduction , 1992, Comput. Optim. Appl..

[19]  J. M. Wein,et al.  Massively parallel auction algorithms for the assignment problem , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[20]  Elizabeth Eskow,et al.  A Revised Modified Cholesky Factorization Algorithm , 1999, SIAM J. Optim..

[21]  Nicholas J. Higham,et al.  Backward Error and Condition of Structured Linear Systems , 1992, SIAM J. Matrix Anal. Appl..

[22]  Guido D. Salvucci,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[23]  Ümit V. Çatalyürek,et al.  Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[24]  James Demmel,et al.  Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.

[25]  Peter Deuflhard,et al.  Numerische Mathematik. I , 2002 .

[26]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[27]  G. Forsythe,et al.  Computer solution of linear algebraic systems , 1969 .

[28]  Dimitri P. Bertsekas,et al.  Dual coordinate step methods for linear network flow problems , 1988, Math. Program..

[29]  Siegfried M. Rump,et al.  Solving Algebraic Problems with High Accuracy , 1983, IMACS World Congress.

[30]  Tobin A. Driscoll,et al.  Searching for Rare Growth Factors Using Multicanonical Monte Carlo Methods , 2007, SIAM Rev..

[31]  Svatopluk Poljak,et al.  Checking robust nonsingularity is NP-hard , 1993, Math. Control. Signals Syst..

[32]  Ambuj K. Singh,et al.  Towards index-based similarity search for protein structure databases , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[33]  Siegfried M. Rump,et al.  Accurate Sum and Dot Product , 2005, SIAM J. Sci. Comput..

[34]  John W. Eaton,et al.  Gnu Octave Manual , 2002 .

[35]  Richard Bellman,et al.  Introduction to Matrix Analysis , 1972 .

[36]  Xiaoye S. Li,et al.  Algorithms for quad-double precision floating point arithmetic , 2000, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[37]  James Hardy Wilkinson,et al.  Rounding errors in algebraic processes , 1964, IFIP Congress.

[38]  Iain S. Duff,et al.  Algorithm 575: Permutations for a Zero-Free Diagonal [F1] , 1981, TOMS.

[39]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[40]  Sivan Toledo,et al.  A Supernodal Out-of-Core Sparse Gaussian-Elimination Method , 2007, PPAM.

[41]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[42]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[43]  Nader Bagherzadeh,et al.  Parallel implementation of the auction algorithm on the Intel hypercube , 1992, Proceedings Sixth International Parallel Processing Symposium.

[44]  James Demmel,et al.  Extra-Precise Iterative Refinement for Overdetermined Least Squares Problems , 2009, TOMS.

[45]  Bora Uçar,et al.  Towards parallel bipartite matching algorithms , 2009 .

[46]  Andrzej Kiełbasiński,et al.  Iterative refinement for linear systems in variable-precision arithmetic , 1981 .

[47]  Nicholas J. Higham,et al.  A Block Algorithm for Matrix 1-Norm Estimation, with an Application to 1-Norm Pseudospectra , 2000, SIAM J. Matrix Anal. Appl..

[48]  Nicholas J. Higham,et al.  Implementation for LAPACK of a Block Algorithm for Matrix 1-Norm Estimation , 2001 .

[49]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[50]  D. Ruiz A Scaling Algorithm to Equilibrate Both Rows and Columns Norms in Matrices 1 , 2001 .

[51]  James Demmel,et al.  Fast matrix multiplication is stable , 2006, Numerische Mathematik.

[52]  Nicholas J. Higham,et al.  FORTRAN codes for estimating the one-norm of a real or complex matrix, with applications to condition estimation , 1988, TOMS.

[53]  Jack Dongarra,et al.  LINPACK Users' Guide , 1987 .

[54]  Marina Marino,et al.  Interval linear systems: the state of the art , 2006, Comput. Stat..

[55]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[56]  Pavel Tvrdík,et al.  Distributed Memory Auction Algorithms for the Linear Assignment Problem , 2002, IASTED PDCS.

[57]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[58]  James Demmel,et al.  Fast linear algebra is stable , 2006, Numerische Mathematik.

[59]  Hossam A. Zaki A comparison of two algorithms for the assignment problem , 1995, Comput. Optim. Appl..

[60]  Joseph W. H. Liu,et al.  On threshold pivoting in the multifrontal method for sparse indefinite systems , 1987, TOMS.

[61]  James Demmel Underflow and the Reliability of Numerical Software , 1984 .

[62]  Rainer E. Burkard,et al.  Selected topics on assignment problems , 2002, Discret. Appl. Math..

[63]  Eugene Lawler,et al.  Combinatorial optimization , 1976 .

[64]  IEEE standard for binary floating-point arithmetic - IEEE standard 754-1985 , 1985 .

[65]  Timothy A. Davis,et al.  A column approximate minimum degree ordering algorithm , 2000, TOMS.

[66]  N. Higham,et al.  Large growth factors in Gaussian elimination with pivoting , 1989 .

[67]  A. Neumaier,et al.  A NEW PIVOTING STRATEGY FOR GAUSSIAN ELIMINATION , 1996 .

[68]  J. Dongarra,et al.  Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[69]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.

[70]  J. Demmel On condition numbers and the distance to the nearest ill-posed problem , 2015 .

[71]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[72]  Iain S. Duff,et al.  On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix , 2000, SIAM J. Matrix Anal. Appl..

[73]  J. L. Rigal,et al.  On the Compatibility of a Given Solution With the Data of a Linear System , 1967, JACM.

[74]  W. Prager,et al.  Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides , 1964 .

[75]  Siegfried M. Rump Ill-Conditioned Matrices Are Componentwise Near to Singularity , 1999, SIAM Rev..

[76]  Dimitri P. Bertsekas,et al.  Parallel synchronous and asynchronous implementations of the auction algorithm , 1991, Parallel Comput..

[77]  G. Forsythe,et al.  Computer solution of linear algebraic systems , 1969 .

[78]  V. Strassen Gaussian elimination is not optimal , 1969 .

[79]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[80]  Patrick R. Amestoy,et al.  Analysis and comparison of two general sparse solvers for distributed memory computers , 2001, TOMS.

[81]  Jozef Gruska Foundations of Computing , 1997 .

[82]  Siegfried M. Rump A class of arbitrarily ill conditioned floating-point matrices , 1991 .

[83]  N. S. Mendelsohn,et al.  Matrices Associated With the Hitchcock Problem , 1962, JACM.

[84]  Jennifer A. Scott,et al.  A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems , 2010, TOMS.

[85]  J. Demmel Open problems in numerical linear algebra , 1992 .

[86]  James Demmel,et al.  The Componentwise Distance to the Nearest Singular Matrix , 1992, SIAM J. Matrix Anal. Appl..

[87]  Cleve B. Moler,et al.  Iterative Refinement in Floating Point , 1967, JACM.

[88]  Nicholas J. Higham,et al.  Modifying the inertia of matrices arising in optimization , 1998 .

[89]  Alan M. Frieze,et al.  A new rounding procedure for the assignment problem with applications to dense graph arrangement problems , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[90]  Timothy A. Davis,et al.  Algorithm 832: UMFPACK V4.3---an unsymmetric-pattern multifrontal method , 2004, TOMS.

[91]  Christopher Umans,et al.  Group-theoretic Algorithms for Matrix Multiplication , 2005, FOCS.

[92]  Dimitri P. Bertsekas,et al.  Network optimization : continuous and discrete models , 1998 .

[93]  James Demmel,et al.  Error bounds from extra-precise iterative refinement , 2006, TOMS.

[94]  James Demmel,et al.  On the Complexity of Computing Error Bounds , 2001, Found. Comput. Math..

[95]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[96]  Alex Pothen,et al.  Computing the block triangular form of a sparse matrix , 1990, TOMS.

[97]  J. H. Wilkinson,et al.  Solution of real and complex systems of linear equations , 1966 .

[98]  Jiri Rohn Enclosing solutions of linear interval equations is NP-hard , 2006, Computing.

[99]  H. Kuhn The Hungarian method for the assignment problem , 1955 .