Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs
暂无分享,去创建一个
[1] Jack J. Dongarra,et al. A Fast Batched Cholesky Factorization on a GPU , 2014, 2014 43rd International Conference on Parallel Processing.
[2] Jack Dongarra,et al. Model-Driven One-Sided Factorizations on Multicore Accelerated Systems , 2014, Supercomput. Front. Innov..
[3] Jack J. Dongarra,et al. Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.
[4] Antonino Tumeo,et al. Accelerating subsurface transport simulation on heterogeneous clusters , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).
[5] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[6] Jack J. Dongarra,et al. Batched matrix computations on hardware accelerators based on GPUs , 2015, Int. J. High Perform. Comput. Appl..
[7] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[8] Allen D. Malony,et al. Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs , 2011, 2011 International Conference on Parallel Processing.
[9] Jack J. Dongarra,et al. A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations , 2015, ISC.
[10] Jack Dongarra,et al. Faster, Cheaper, Better { a Hybridization Methodology to Develop Linear Algebra Software for GPUs , 2010 .
[11] Massimiliano Fatica,et al. Power/Performance Trade-Offs of Small Batched LU Based Solvers on GPUs , 2013, Euro-Par.
[12] James Demmel,et al. LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs , 2008 .
[13] Jack J. Dongarra,et al. Towards batched linear solvers on accelerated hardware platforms , 2015, PPOPP.