Adaptive Triangular System Solving

Large-scale applications and software systems are getting increasingly complex. To deal with this complexity, those systems must manage themselves in accordance with high-level guidance from humans. Adaptive and hybrid algorithms enable this self-management of resources and structured inputs. In this talk, we first propose a classification of the different notions of adaptivity. For us, an algorithm is adaptive (or a poly-algorithm) when there is a choice at a high level between at least two distinct algorithms, each of which could solve the same problem. The choice is strategic, not tactical. It is motivated by an increase of the performance of the execution, depending on both input/output data and computing resources. Then we propose a new adaptive algorithm for the exact simultaneous resolution of several triangular systems over finite fields. The resolution of such systems is e.g. one of the two main operations in block Gaussian elimination. For solving triangular systems over finite fields, the block algorithm reduces to matrix multiplication and achieves the best known algebraic complexity. Exact matrix multiplication, together with matrix factorizations, over finite fields can now be performed at the speed of the highly optimized numerical BLAS routines. This has been established by the FFLAS and FFPACK libraries. In this talk we propose several practicable variants solving these systems: a pure recursive version, a reduction to the numerical dtrsm routine and a delaying of the modulus operation. Then a cascading scheme is proposed to merge these variants into an adaptive sequential algorithm. We then propose a parallelization of this resolution. The adaptive sequential algorithm is not the best parallel algorithm since its recursion induces a dependancy. A better parallel algorithm would be to first invert the matrix and then to multiply this inverse by the right hand side. Unfortunately the latter requires more total operations than the adaptive algorithm. We thus propose a coupling of the sequential algorithm and of the parallel one in order to get the best performances on any number of processors. The resulting cascading is then an adaptation to resources. This shows that the same process has been used both for adaptation to data and to resources. We thus propose a generic framework for the automatic adaptation of algorithms using recursive cascading.

[1]  Julien Bernard,et al.  On-Line Adaptive Parallel Prefix Computation , 2006, Euro-Par.

[2]  Jean-Guillaume Dumas,et al.  Adaptive and Hybrid Algorithms: classification and illustration on triangular system solving ∗ , 2006 .

[3]  Richard S. Bird,et al.  An introduction to the theory of lists , 1987 .

[4]  Jean-Guillaume Dumas,et al.  FFPACK: finite field linear algebra package , 2004, ISSAC '04.

[5]  Jason Maassen,et al.  Satin: Simple and Efficient Java-based Grid Programming , 2005, Scalable Comput. Pract. Exp..

[6]  Victor Y. Pan,et al.  Work-Preserving Speed-Up of Parallel Matrix Computations , 1995, SIAM J. Comput..

[7]  Michael A. Bender,et al.  Concurrent cache-oblivious b-trees , 2005, SPAA '05.

[8]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[9]  Axel W. Krings,et al.  A Checkpoint/Recovery Model for Heterogeneous Dataflow Computations Using Work-Stealing , 2005, Euro-Par.

[10]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[11]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[12]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[13]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[14]  Jean-Guillaume Dumas,et al.  Finite field linear algebra subroutines , 2002, ISSAC '02.

[15]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[16]  Guy E. Blelloch,et al.  The data locality of work stealing , 2000, SPAA.

[17]  Robert A. van de Geijn,et al.  Anatomy of high-performance matrix multiplication , 2008, TOMS.

[18]  Murray Cole,et al.  Parallel Programming with List Homomorphisms , 1995, Parallel Process. Lett..

[19]  Thierry Gautier,et al.  Algorithmes parallèles à grain adaptatif et applications , 2005, Tech. Sci. Informatiques.

[20]  Victor Y. Pan,et al.  Fast rectangular matrix multiplications and improving parallel matrix computations , 1997, PASCO '97.

[21]  Anahí Gallardo Velázquez,et al.  Conference , 1969, Journal of Neuroscience Methods.