High-Performance Matrix Multiplication on the New Generation Shenwei Processor