DiVIT: Algorithm and architecture co-design of differential attention in vision transformer
暂无分享,去创建一个
KenLi Li | Yikun Hu | Yangfan Li | Fan Wu
[1] Tae Jun Ham,et al. ELSA: Hardware-Software Co-design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[2] Michael T. Niemier,et al. In-Memory Computing based Accelerator for Transformer Networks for Long Sequences , 2021, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[3] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[4] Hanrui Wang,et al. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[5] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[6] Deog-Kyoon Jeong,et al. A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[7] Kenli Li,et al. FlinkCL: An OpenCL-Based In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data , 2018, IEEE Transactions on Computers.
[8] Mostafa Mahmoud,et al. Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] Zhuo Tang,et al. GPU-Accelerated Parallel Hierarchical Extreme Learning Machine on Flink for Big Data , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.
[10] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[11] Andreas Moshovos,et al. Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Kenli Li,et al. GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data , 2016, IEEE Transactions on Parallel and Distributed Systems.
[13] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[15] Onur Mutlu,et al. Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.
[16] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[17] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .