暂无分享,去创建一个
Nicholas J. Higham | Sivasankaran Rajamanickam | Stanimire Tomov | Piotr Luszczek | Ichitaro Yamazaki | Terry Cojean | Pratik Nayak | Hartwig Anzt | Thomas Grützmacher | Erik G. Boman | Barry Smith | Tobias Ribizel | Ahmad Abdelfattah | Kasia Swirydowicz | Erin C. Carson | Neil Lindquist | Jack J. Dongarra | Mark Gates | Sherry Li | Yang Liu | Jennifer A. Loe | Srikara Pranesh | Stephen Thomas | Yaohung M. Tsai | Urike Meier Yang | Stephen J. Thomas | N. Higham | J. Dongarra | P. Luszczek | I. Yamazaki | E. Boman | E. Carson | H. Anzt | S. Tomov | S. Rajamanickam | S. Pranesh | M. Gates | T. Ribizel | Thomas Grützmacher | K. Swirydowicz | Barry Smith | A. Abdelfattah | T. Cojean | J. Loe | Pratik Nayak | Y. Tsai | Sherry Li | Neil Lindquist | Yang Liu | Mark Gates
[1] Enrique S. Quintana-Ortí,et al. Toward a modular precision ecosystem for high-performance computing , 2019, Int. J. High Perform. Comput. Appl..
[2] Valeria Simoncini,et al. Theory of Inexact Krylov Subspace Methods and Applications to Scientific Computing , 2003, SIAM J. Sci. Comput..
[3] Jack J. Dongarra,et al. The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques , 2018, ICCS.
[4] Maximilian Emans,et al. Mixed-precision AMG as linear equation solver for definite systems , 2010, ICCS.
[5] Jack J. Dongarra,et al. Algorithm 589: SICEDR: A FORTRAN Subroutine for Improving the Accuracy of Computed Matrix Eigenvalues , 1982, TOMS.
[6] C. Paige. Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem , 1980 .
[7] Martin Kronbichler,et al. Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors , 2019, ACM Trans. Parallel Comput..
[8] H. Walker. Implementation of the GMRES method using householder transformations , 1988 .
[9] Miroslav Rozlozník,et al. Modified Gram-Schmidt (MGS), Least Squares, and Backward Stability of MGS-GMRES , 2006, SIAM J. Matrix Anal. Appl..
[10] A. Greenbaum. Estimating the Attainable Accuracy of Recursively Computed Residual Methods , 1997, SIAM J. Matrix Anal. Appl..
[11] Nicholas J. Higham,et al. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Inderjit S. Dhillon,et al. The design and implementation of the MRRR algorithm , 2006, TOMS.
[13] John L. Gustafson,et al. The End of Error: Unum Computing , 2015 .
[14] Yusaku Yamamoto,et al. Roundoff error analysis of the Cholesky QR2 algorithm , 2015 .
[15] NICHOLAS J. HIGHAM,et al. Exploiting Lower Precision Arithmetic in Solving Symmetric Positive Definite Linear Systems and Least Squares Problems , 2019, SIAM J. Sci. Comput..
[16] Zdenek Strakos,et al. Residual and Backward Error Bounds in Minimum Residual Krylov Subspace Methods , 2001, SIAM J. Sci. Comput..
[17] Enrique S. Quintana-Ortí,et al. Adaptive precision in block‐Jacobi preconditioning for iterative sparse linear system solvers , 2019, Concurr. Comput. Pract. Exp..
[18] Åke Björck. Iterative refinement of linear least squares solutions I , 1967 .
[19] J. H. Wilkinson,et al. IMPROVING THE ACCURACY OF COMPUTED EIGENVALUES AND EIGENVECTORS , 1983 .
[20] Stephen F. McCormick,et al. Algebraic error analysis for mixed-precision multigrid solvers , 2020, SIAM J. Sci. Comput..
[21] James Demmel,et al. Error bounds from extra-precise iterative refinement , 2006, TOMS.
[22] G. Stewart. Introduction to matrix computations , 1973 .
[23] Y. Saad,et al. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .
[24] Jack Dongarra,et al. GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems , 2019 .
[25] Walter Gander,et al. Gram‐Schmidt orthogonalization: 100 years and more , 2013, Numer. Linear Algebra Appl..
[26] Nicholas J. Higham,et al. Squeezing a Matrix into Half Precision, with an Application to Solving Linear Systems , 2019, SIAM J. Sci. Comput..
[27] ERIN CARSON,et al. Three-Precision GMRES-Based Iterative Refinement for Least Squares Problems , 2020, SIAM J. Sci. Comput..
[28] Å. Björck. Solving linear least squares problems by Gram-Schmidt orthogonalization , 1967 .
[29] Åke Björck,et al. Iterative refinement of linear least squares solutions II , 1967 .
[30] Nicholas J. Higham,et al. The accuracy of solutions to triangular systems , 1989 .
[31] G. Meurant,et al. The Lanczos and conjugate gradient algorithms in finite precision arithmetic , 2006, Acta Numerica.
[32] Terry Cojean,et al. A customized precision format based on mantissa segmentation for accelerating sparse linear algebra , 2019, Concurr. Comput. Pract. Exp..
[33] Takeshi Ogita,et al. Iterative refinement for symmetric eigenvalue decomposition II: clustered eigenvalues , 2019, Japan journal of industrial and applied mathematics.
[34] Théo Mary,et al. Sharper Probabilistic Backward Error Analysis for Basic Linear Algebra Kernels with Random Data , 2020, SIAM J. Sci. Comput..
[35] Martin Kronbichler,et al. Multigrid for matrix-free finite element computations on graphics processors , 2017 .
[36] Sivasankaran Rajamanickam,et al. Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems , 2012, Sci. Program..
[37] C. Puglisi. Modification of the householder method based on the compact WY representation , 1992 .
[38] Nicholas J. Higham,et al. Simulating Low Precision Floating-Point Arithmetic , 2019, SIAM J. Sci. Comput..
[39] Qiang Ye,et al. Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals , 2000, SIAM J. Sci. Comput..
[40] Nicholas J. Higham,et al. A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems , 2017, SIAM J. Sci. Comput..
[41] Kipton Barros,et al. Solving lattice QCD systems of equations using mixed precision solvers on GPUs , 2009, Comput. Phys. Commun..
[42] Serge Gratton,et al. Exploiting variable precision in GMRES , 2019, ArXiv.
[43] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[44] Vivienne Sze,et al. Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks , 2018, ArXiv.
[45] Jack Dongarra,et al. Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[46] Xiaobai Sun,et al. Aggregations of Elementary Transformations , 1996 .
[47] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[48] Julien Langou,et al. A Rank-k Update Procedure for Reorthogonalizing the Orthogonal Factor from Modified Gram-Schmidt , 2004, SIAM J. Matrix Anal. Appl..
[49] Christopher C. Paige,et al. The Effects of Loss of Orthogonality on Large Scale Numerical Computations , 2018, ICCSA.
[50] James Demmel,et al. Extra-Precise Iterative Refinement for Overdetermined Least Squares Problems , 2009, TOMS.
[51] Bora Uçar,et al. A Symmetry Preserving Algorithm for Matrix Scaling , 2014, SIAM J. Matrix Anal. Appl..
[52] Stanimire Tomov,et al. Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation , 2019, 2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI).
[53] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[54] Jack Dongarra,et al. Design and Implementation for FFT-ECP on Distributed Accelerated Systems , 2019 .
[55] Teruo Tanaka,et al. Mixed-Precision AMG method for Many Core Accelerators , 2014, EuroMPI/ASIA.
[56] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[57] Takeshi Ogita,et al. Iterative refinement for symmetric eigenvalue decomposition , 2018 .
[58] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[59] Sebastian Schöps,et al. GPU-accelerated mixed precision algebraic multigrid preconditioners for discrete elliptic field problems , 2014 .
[60] Nicholas J. Higham,et al. Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores , 2020, SIAM J. Sci. Comput..
[61] H. V. der. Residual Replacement Strategies for Krylov Subspace Iterative Methods for the Convergence of True Residuals , 2000 .
[62] Allan Peter Engsig-Karup,et al. A Fast GPU-Accelerated Mixed-Precision Strategy for Fully Nonlinear Water Wave Computations , 2013 .
[63] Higham Nicholas. Error Analysis For Standard and GMRES-Based Iterative Refinement in Two and Three-Precisions , 2019 .
[64] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[65] Jack J. Dongarra,et al. Investigating half precision arithmetic to accelerate dense linear system solvers , 2017, ScalA@SC.
[66] Karl Rupp,et al. Preparing sparse solvers for exascale computing , 2020, Philosophical Transactions of the Royal Society A.
[67] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[68] Nicholas J. Higham,et al. A New Approach to Probabilistic Rounding Error Analysis , 2019, SIAM J. Sci. Comput..
[69] Gerard L. G. Sleijpen,et al. Inexact Krylov Subspace Methods for Linear Systems , 2004, SIAM J. Matrix Anal. Appl..
[70] Julien Langou,et al. A note on the error analysis of classical Gram–Schmidt , 2006, Numerische Mathematik.
[71] Erin Carson,et al. Communication-Avoiding Krylov Subspace Methods in Theory and Practice , 2015 .
[72] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[73] Peter Lindstrom,et al. Error Analysis of ZFP Compression for Floating-Point Data , 2018, SIAM J. Sci. Comput..
[74] Nicholas J. Higham,et al. Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions , 2018, SIAM J. Sci. Comput..
[75] Takateru Yamagishi,et al. GPU Acceleration of a Non-hydrostatic Ocean Model with a Multigrid Poisson/Helmholtz Solver , 2016, ICCS.
[76] Christopher C. Paige,et al. Loss and Recapture of Orthogonality in the Modified Gram-Schmidt Algorithm , 1992, SIAM J. Matrix Anal. Appl..
[77] Stanimire Tomov,et al. Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precision , 2018 .
[78] Stephen F. McCormick,et al. Discretization-error-accurate mixed-precision multigrid solvers , 2020, ArXiv.
[79] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[80] Nikolaos V. Sahinidis,et al. Scaling linear optimization problems prior to application of the simplex method , 2012, Comput. Optim. Appl..
[81] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[82] Cleve B. Moler,et al. Iterative Refinement in Floating Point , 1967, JACM.
[83] Christopher C. Paige,et al. Properties of a Unitary Matrix Obtained from a Sequence of Normalized Vectors , 2014, SIAM J. Matrix Anal. Appl..
[84] Tze Meng Low,et al. Accumulating Householder transformations, revisited , 2006, TOMS.
[85] W. Prager,et al. Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides , 1964 .
[86] Prabhat,et al. Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[87] Yusaku Yamamoto,et al. Shifted Cholesky QR for Computing the QR Factorization of Ill-Conditioned Matrices , 2018, SIAM J. Sci. Comput..
[88] JESSE L. BARLOW,et al. Block Modified Gram-Schmidt Algorithms and Their Analysis , 2019, SIAM J. Matrix Anal. Appl..
[89] A. Greenbaum. Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences , 1989 .
[90] M. Rozložník. Numerics of Gram-Schmidt orthogonalization , 2007 .
[91] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[92] J. Malard,et al. Efficiency and scalability of two parallel QR factorization algorithms , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[93] Enrique S. Quintana-Ortí,et al. Improved Accuracy and Parallelism for MRRR-Based Eigensolvers - A Mixed Precision Approach , 2013, SIAM J. Sci. Comput..
[94] Stanimire Tomov,et al. Optimizing the Fast Fourier Transform Using Mixed Precision on Tensor Core Hardware , 2018, 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW).
[95] Jack J. Dongarra,et al. Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs , 2015, SIAM J. Sci. Comput..
[96] Gerard L. G. Sleijpen,et al. Reliable updated residuals in hybrid Bi-CG methods , 1996, Computing.
[97] Xiaomei Yang. Rounding Errors in Algebraic Processes , 1964, Nature.
[98] Jack J. Dongarra,et al. Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU , 2016, ACM Trans. Math. Softw..