GPU computing performance analysis on matrix multiplication

The machine learning has been widely used in intelligent data mining. The high-computational complexity of machine learning and huge data volume present challenges to computing platforms. Graphics processor unit (GPU) provides powerful computing support for machine learning but shows different performances under different computing scales and/or different development methods. Analysing the performance of GPUs in different application scenarios helps to improve computing performance. In this study, the matrix multiplication, which is a common and time-consuming computation operation in machine learning, is performed on different data scales and different development methods to analyse the relationship between GPU computing performance with matrix scale and development methods. The experimental data shows that the performance of GPU is not much improved compared with the central processing unit in small-scale data calculation. Also, using a high-level application programming interface for GPU development is less computing-efficient than the GPU programming language computes unified device architecture C.