High-performance realization method of BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of SW platform

The invention puts forward a high-performance realization method of a BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of an SW platform. An ''interface-driver-kernel assembly core code'' three-layer code design framework is adopted by aiming at a domestic SW1600 platform, technical means, including a multiply-add instruction, loop unrolling, software pipeline instruction rearrangement, SIMD (Single Instruction Multiple Data) vector operation, register blocking technology and the like which are associated with platform architecture, are adopted to realize assembly level manual optimization, the problem that a compiler can not sufficiently optimize a compute-intensive function GEMM is solved, and function performance is greatly improved. Compared with an open source BLAS math library GotoBLAS, the high-performance realization method is characterized in that an average speed-up ratio is 4.72 and a highest speed-up ratio is 5.61.