Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture
暂无分享,去创建一个
Guangwen Yang | Hailong Yang | Zhongzhi Luan | Depei Qian | Lin Gan | Yi Liu | Mingzhen Li | L. Gan | Guangwen Yang | D. Qian | Hailong Yang | Zhongzhi Luan | Yi Liu | Mingzhen Li
[1] James Demmel,et al. LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs , 2008 .
[2] Jesús Labarta,et al. Variable Batched DGEMM , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).
[3] Meng Zhang,et al. Redesigning LAMMPS for Peta-Scale and Hundred-Billion-Atom Simulation on Sunway TaihuLight , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] John K. Reid,et al. The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.
[5] Emmanuel Agullo,et al. Task‐based FMM for heterogeneous architectures , 2016, Concurr. Comput. Pract. Exp..
[6] Sanjay Ranka,et al. A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization , 2018, ICPP.
[7] Stanimire Tomov,et al. A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations , 2018, IEEE Transactions on Parallel and Distributed Systems.
[8] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[9] Guangwen Yang,et al. swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).
[10] Jack J. Dongarra,et al. A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators , 2010, VECPAR.
[11] Simon D. Hammond,et al. Designing Vector-Friendly Compact BLAS and LAPACK Kernels , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Shoaib Kamil,et al. Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Timothy A. Davis,et al. Accelerating sparse cholesky factorization on GPUs , 2014, IA3 '14.
[14] Dror Irony,et al. Parallel and fully recursive multifrontal sparse Cholesky , 2004, Future Gener. Comput. Syst..
[15] Weifeng Liu,et al. swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures , 2018, PPoPP.
[16] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[17] Alfredo Buttari,et al. Fine-Grained Multithreading for the Multifrontal QR Factorization of Sparse Matrices , 2013, SIAM J. Sci. Comput..
[18] Qian Wang,et al. AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[19] Ivan V. Oseledets,et al. "Compress and eliminate" solver for symmetric positive definite sparse matrices , 2016, SIAM J. Sci. Comput..
[20] J. T. Smith. Conservative modeling of 3-D electromagnetic fields, Part II: Biconjugate gradient solution and an accelerator , 1996 .
[21] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[22] Dianne P. O'Leary,et al. Data-flow algorithms for parallel matrix computation , 1985, CACM.
[23] Jack J. Dongarra,et al. Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.
[24] Depei Qian,et al. Multi-role SpTRSV on Sunway Many-Core Architecture , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[25] V. Natoli,et al. Exploring New Architectures in Accelerating CFD for Air Force Applications , 2008, 2008 DoD HPCMP Users Group Conference.
[26] Depei Qian,et al. swMR: A Framework for Accelerating MapReduce Applications on Sunway Taihulight , 2018 .
[27] Alan George,et al. Computer Solution of Large Sparse Positive Definite , 1981 .
[28] Anamitra R. Choudhury,et al. Multifrontal Factorization of Sparse SPD Matrices on GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[29] S. Treitel,et al. A REVIEW OF LEAST-SQUARES INVERSION AND ITS APPLICATION TO GEOPHYSICAL PROBLEMS* , 1984 .
[30] Jack J. Dongarra,et al. A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations , 2015, ISC.
[31] Jack Dongarra,et al. Distibuted Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA , 2011 .
[32] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[33] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[34] Enrique S. Quintana-Ortí,et al. Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors , 2019, Parallel Comput..
[35] Jack J. Dongarra,et al. The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems , 2017, ICCS.
[36] Wenguang Chen,et al. ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[38] Shoaib Kamil,et al. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[39] Weiguo Liu,et al. Redesigning CAM-SE for Peta-Scale Climate Modeling Performance and Ultra-High Resolution on Sunway TaihuLight , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[40] Xin Liu,et al. Towards Efficient SpMV on Sunway Manycore Architectures , 2018, ICS.
[41] Timothy A. Davis,et al. Dynamic Supernodes in Sparse Cholesky Update/Downdate and Triangular Solves , 2009, TOMS.
[42] Pascal Hénon,et al. PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems , 2002, Parallel Comput..
[43] Guangwen Yang,et al. swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[44] N. Moës,et al. Improved implementation and robustness study of the X‐FEM for stress analysis around cracks , 2005 .
[45] Sivan Toledo,et al. Elimination Structures in Scientific Computing , 2004, Handbook of Data Structures and Applications.
[46] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[47] Wolfgang Fichtner,et al. PARDISO: a high-performance serial and parallel sparse linear solver in semiconductor device simulation , 2001, Future Gener. Comput. Syst..
[48] YANQING CHEN,et al. Algorithm 8 xx : CHOLMOD , supernodal sparse Cholesky factorization and update / downdate ∗ , 2006 .
[49] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[50] Wei Zhang,et al. Simulating the Wenchuan Earthquake with Accurate Surface Topography on Sunway TaihuLight , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[51] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[52] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[53] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[54] Jack Dongarra,et al. A Proposed API for Batched Basic Linear Algebra Subprograms , 2016 .