Preliminary Performance Evaluation of Application Kernels Using ARM SVE with Multiple Vector Lengths

Modern high performance processors are equipped with very wide SIMD instruction set. SVE (Scalable Vector Extension) is an ARM® SIMD technology that supports vector lengths from 128 bits to 2048 bits. One of its promising features is to offer "vector-length agnostic" programming to allow the same SVE code to run on hardware of any vector length without any modification of the code. This feature would be useful to explore the best vector length with appropriate hardware resources in the space of various combinations of hardware parameters in order to make more efficient use of hardware resources, since we can use the same vectorized SIMDcode. In this paper, we report the performance of application kernelsusing ARM SVE with multiple vector lengths while keeping the hardware resource the same. We have confirmed that when the performance of the program is limited by a bottleneck of a long chain of arithmetic operations or instruction issues, the performance can be improved by increasing the vector length. However, it was necessary to prepare a sufficient number of physical registers for performance improvement, and when the number of physical registers was too small, it was found that with such a program, the performance might be reduced. When the performance is limited by memory access bandwidth to cache and memory, the vector length does not affect the performance significantly.

[1]  Paul Walker,et al.  The ARM Scalable Vector Extension , 2017, IEEE Micro.

[2]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.