Vis-TOP: Visual Transformer Overlay Processor

In recent years, Transformer [23] has achieved good results in Natural Language Processing (NLP) and has also started to expand into Computer Vision (CV). Excellent models such as the Vision Transformer [5] and Swin Transformer [17] have emerged. At the same time, the platform for Transformer models was extended to embedded devices to meet some resource-sensitive application scenarios. However, due to the large number of parameters, the complex computational flow and the many different structural variants of Transformer models, there are a number of issues that need to be addressed in its hardware design. This is both an opportunity and a challenge. We propose Vis-TOP (Visual Transformer Overlay Processor), an overlay processor for various visual Transformer models. It differs from coarse-grained overlay processors such as CPU, GPU, NPE, and from fine-grained customized designs for a specific model. Vis-TOP summarizes the characteristics of all visual Transformer models and implements a three-layer and two-level transformation structure that allows the model to be switched or changed freely without changing the hardware architecture. At the same time, the corresponding instruction bundle and hardware architecture are designed in three-layer and two-level transformation structure. After quantization of Swin Transformer tiny model using 8-bit fixed points (fix_8), we implemented an overlay processor on the ZCU102. Compared to GPU, the TOP throughput is 1.5x higher. Compared to the existing Transformer accelerators, our throughput per DSP is between 2.2x and 11.7x higher than others. In a word, the approach in this paper meets the requirements of real-time AI in terms of both resource consumption and inference speed. Vis-TOP provides a cost-effective and power-effective solution based on reconfigurable devices for computer vision at the edge.

[1]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[2]  Jason Cong,et al.  FPGA-based accelerator for long short-term memory recurrent neural networks , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[3]  Lei He,et al.  OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Hongmei Li,et al.  FPGA Based Real-Time Processing Architecture for Recurrent Neural Network , 2017 .

[5]  Qun Liu,et al.  TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.

[6]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Lei He,et al.  Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks , 2020, FPGA.

[8]  Lei He,et al.  Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Mohamed S. Abdelfattah,et al.  DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[10]  Siyuan Lu,et al.  Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer , 2020, 2020 IEEE 33rd International System-on-Chip Conference (SOCC).

[11]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[12]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[13]  Jianfei Cai,et al.  Scalable Vision Transformers with Hierarchical Pooling , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Luciano Lavagno,et al.  Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs , 2018, FPGA.

[15]  Jungwook Choi,et al.  OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator , 2020, MLSys.

[16]  Zhijian Liu,et al.  Lite Transformer with Long-Short Range Attention , 2020, ICLR.

[17]  Jian Cheng,et al.  Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing , 2021, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Soheil Ghiasi,et al.  Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Levent Sagun,et al.  ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases , 2021, ICML.

[20]  Ji Li,et al.  FTRANS: energy-efficient acceleration of transformers using FPGA , 2020, ISLPED.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Yiming Yang,et al.  MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.

[23]  Mehdi Kamal,et al.  POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[24]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[25]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Lei He,et al.  NPE: An FPGA-based Overlay Processor for Natural Language Processing , 2021, FPGA.

[28]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.