论文信息 - GPU computing performance analysis on matrix multiplication

GPU computing performance analysis on matrix multiplication

The machine learning has been widely used in intelligent data mining. The high-computational complexity of machine learning and huge data volume present challenges to computing platforms. Graphics processor unit (GPU) provides powerful computing support for machine learning but shows different performances under different computing scales and/or different development methods. Analysing the performance of GPUs in different application scenarios helps to improve computing performance. In this study, the matrix multiplication, which is a common and time-consuming computation operation in machine learning, is performed on different data scales and different development methods to analyse the relationship between GPU computing performance with matrix scale and development methods. The experimental data shows that the performance of GPU is not much improved compared with the central processing unit in small-scale data calculation. Also, using a high-level application programming interface for GPU development is less computing-efficient than the GPU programming language computes unified device architecture C.

[1] Wen Gao,et al. Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing , 2017, IEEE Transactions on Image Processing.

[2] Sriram Krishnamoorthy,et al. Performance characterization of global address space applications: a case study with NWChem , 2012, Concurr. Comput. Pract. Exp..

[3] Wen-mei W. Hwu,et al. Compute Unified Device Architecture Application Suitability , 2009, Computing in Science & Engineering.

[4] Stanimire Tomov,et al. A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations , 2018, IEEE Transactions on Parallel and Distributed Systems.