The ARM Scalable Vector Extension

This article describes the ARM Scalable Vector Extension (SVE). Several goals guided the design of the architecture. First was the need to extend the vector processing capability associated with the ARM AArch64 execution state to better address the computational requirements in domains such as high-performance computing, data analytics, computer vision, and machine learning. Second was the desire to introduce an extension that can scale across multiple implementations, both now and into the future, allowing CPU designers to choose the vector length most suitable for their power, performance, and area targets. Finally, the architecture should avoid imposing a software development cost as the vector length changes and where possible reduce it by improving the reach of compiler auto-vectorization technologies. SVE achieves these goals. It allows implementations to choose a vector register length between 128 and 2,048 bits. It supports a vector-length agnostic programming model that lets code run and scale automatically across all vector lengths without recompilation. Finally, it introduces several innovative features that begin to overcome some of the traditional barriers to autovectorization.

[1]  Nalini Vasudevan,et al.  FlexVec: auto-vectorization for irregular loops , 2016, PLDI.

[2]  Vikram S. Adve,et al.  The LLVM Compiler Framework and Infrastructure Tutorial , 2004, LCPC.

[3]  Avinash Sodani,et al.  Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[4]  Bashir M. Al-Hashimi,et al.  Advanced SIMD: Extending the reach of contemporary SIMD architectures , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Scott A. Mahlke,et al.  From SODA to scotch: The evolution of a wireless baseband processor , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[6]  Pradeep Dubey,et al.  Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.

[7]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.