论文信息 - DiVIT: Algorithm and architecture co-design of differential attention in vision transformer - 字舞流文

DiVIT: Algorithm and architecture co-design of differential attention in vision transformer

KenLi Li | Yikun Hu | Yangfan Li | Fan Wu

[1] Tae Jun Ham,et al. ELSA: Hardware-Software Co-design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[2] Michael T. Niemier,et al. In-Memory Computing based Accelerator for Transformer Networks for Long Sequences , 2021, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.

[4] Hanrui Wang,et al. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[5] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[6] Deog-Kyoon Jeong,et al. A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7] Kenli Li,et al. FlinkCL: An OpenCL-Based In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data , 2018, IEEE Transactions on Computers.

[8] Mostafa Mahmoud,et al. Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9] Zhuo Tang,et al. GPU-Accelerated Parallel Hierarchical Extreme Learning Machine on Flink for Big Data , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11] Andreas Moshovos,et al. Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12] Kenli Li,et al. GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[13] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[15] Onur Mutlu,et al. Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[16] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .