DiVIT: Algorithm and architecture co-design of differential attention in vision transformer

[1]  Tae Jun Ham,et al.  ELSA: Hardware-Software Co-design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[2]  Michael T. Niemier,et al.  In-Memory Computing based Accelerator for Transformer Networks for Long Sequences , 2021, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[4]  Hanrui Wang,et al.  SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[5]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[6]  Deog-Kyoon Jeong,et al.  A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Kenli Li,et al.  FlinkCL: An OpenCL-Based In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data , 2018, IEEE Transactions on Computers.

[8]  Mostafa Mahmoud,et al.  Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Zhuo Tang,et al.  GPU-Accelerated Parallel Hierarchical Extreme Learning Machine on Flink for Big Data , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11]  Andreas Moshovos,et al.  Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Kenli Li,et al.  GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[15]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[16]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .