Soft error resilient QR factorization for hybrid system with GPGPU

The general purpose graphics processing units (GPGPU) are increasingly deployed for scientific computing due to their performance advantages over CPUs. As a result, fault tolerance has become a more serious concern compared to the period when GPGPUs were used exclusively for graphics applications. Using GPUs and CPUs together in a hybrid computing system increases flexibility and performance but also increases the possibility of the computations being affected by soft errors. In this work, we propose a soft error resilient algorithm for QR factorization on such hybrid systems. Our contributions include (1) a checkpointing and recovery mechanism for the left-factor Q whose performance is scalable on hybrid systems; (2) optimized Givens rotation utilities on GPGPUs to efficiently reduce an upper Hessenberg matrix to an upper triangular form for the protection of the right factor R, and (3) a recovery algorithm based on QR update on GPGPUs. Experimental results show that our fault tolerant QR factorization can success- fully detect and recover from soft errors in the entire matrix with little overhead on hybrid systems with GPGPUs.

[1]  IEEE Transactions on Parallel and Distributed Systems, Vol. 13 , 2002 .

[2]  Franklin T. Luk,et al.  An Analysis of Algorithm-Based Fault Tolerance Techniques , 1988, J. Parallel Distributed Comput..

[3]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[4]  Jack Dongarra,et al.  The design of linear algebra libraries for high performance computers , 1993 .

[5]  Haesun Park On Multiple Error Detection in Matrx Triangularizations Using Checksum Methods , 1992, J. Parallel Distributed Comput..

[6]  Bianca Schroeder,et al.  Understanding failures in petascale computers , 2007 .

[7]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[8]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[9]  Franklin T. Luk,et al.  A Linear Algebraic Model of Algorithm-Based Fault Tolerance , 1988, IEEE Trans. Computers.

[10]  Franklin T. Luk,et al.  Fault-Tolerant Matrix Triangularizations on Systolic Arrays , 1988, IEEE Trans. Computers.

[11]  Satoshi Matsuoka,et al.  A high-performance fault-tolerant software framework for memory on commodity GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[12]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[13]  Jack J. Dongarra,et al.  Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[14]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[15]  Vijay S. Pande,et al.  Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU , 2009, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[16]  Jack J. Dongarra,et al.  High Performance Dense Linear System Solver with Soft Error Resilience , 2011, 2011 IEEE International Conference on Cluster Computing.

[17]  Roman Wyrzykowski,et al.  Fault Tolerant QR-Decomposition Algorithm and Its Parallel Implementation , 1998, Euro-Par.

[18]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.

[19]  Ravishankar K. Iyer,et al.  Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[20]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[21]  Allen D. Malony,et al.  An experimental approach to performance measurement of heterogeneous parallel applications using CUDA , 2010, ICS '10.

[22]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[23]  Christian H. Bischof,et al.  Adaptive Condition Estimation for Rank-One Updates of QR Factorizations , 1992, SIAM J. Matrix Anal. Appl..

[24]  Volodymyr Kindratenko,et al.  On testing GPU memory for hard and soft errors , 2011 .

[25]  Kevin Skadron,et al.  A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors , 2007, GH '07.

[26]  Satoshi Matsuoka,et al.  Software-Based ECC for GPUs , 2011 .

[27]  Elena N. Akimova,et al.  Parallel algorithms for solving linear systems with block-tridiagonal matrices on multi-core CPU with GPU , 2012, J. Comput. Sci..

[28]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[29]  Kai Li,et al.  Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..

[30]  Gene H. Golub,et al.  Methods for modifying matrix factorizations , 1972, Milestones in Matrix Computation.

[31]  Willy Zwaenepoel,et al.  The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[32]  Tino Heijmen,et al.  Radiation-induced soft errors in digital circuits - A literature survey , 2002 .

[33]  Colin C. Murphy,et al.  Fault tolerant matrix triangularization and solution of linear systems of equations , 1992, [1992] Proceedings of the International Conference on Application Specific Array Processors.

[34]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[35]  Hui Liu,et al.  Matrix Multiplication on GPUs with On-Line Fault Tolerance , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[36]  Thomas Hérault,et al.  Algorithm-based fault tolerance for dense matrix factorizations , 2012, PPoPP '12.

[37]  James Demmel,et al.  On computing givens rotations reliably and efficiently , 2002, TOMS.

[38]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[39]  Jack J. Dongarra,et al.  Linear algebra libraries for high-performance computers: a personal perspective , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[40]  Jack J. Dongarra,et al.  Algorithm-based diskless checkpointing for fault tolerant matrix operations , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[41]  DongarraJack,et al.  Algorithm-based fault tolerance for dense matrix factorizations , 2012 .

[43]  Jacob A. Abraham,et al.  Fault Tolerance Techniques For Highly Parallel Signal Processing Architectures , 1986, Photonics West - Lasers and Applications in Science and Engineering.