Accelerating Transducers through Adjacent Token Merging
暂无分享,去创建一个
Jinyu Li | Yu Wu | Shujie Liu | Yuang Li
[1] Cheng-Yang Fu,et al. Token Merging: Your ViT But Faster , 2022, ICLR.
[2] Hung-yi Lee,et al. On Compressing Sequences for Self-Supervised Speech Models , 2022, 2022 IEEE Spoken Language Technology Workshop (SLT).
[3] J. Chorowski,et al. Variable-rate hierarchical CPC leads to acoustic unit discovery in speech , 2022, NeurIPS.
[4] Michael W. Mahoney,et al. Squeezeformer: An Efficient Transformer for Automatic Speech Recognition , 2022, NeurIPS.
[5] Michael Auli,et al. On-demand compute reduction with stochastic wav2vec 2.0 , 2022, INTERSPEECH.
[6] Jinyu Li. Recent Advances in End-to-End Automatic Speech Recognition , 2021, APSIPA Transactions on Signal and Information Processing.
[7] Hung-yi Lee,et al. Distilhubert: Speech Representation Learning by Layer-Wise Distillation of Hidden-Unit Bert , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Valentin Vielzeuf,et al. Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[9] K. Keutzer,et al. Learned Token Pruning for Transformers , 2021, KDD.
[10] Takaaki Hori,et al. Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers , 2021, Interspeech.
[11] Brian Kingsbury,et al. Advancing RNN Transducer Technology for Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Ryo Masumura,et al. Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Hanrui Wang,et al. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[14] Andreas Schwarz,et al. Improving RNN-T ASR Accuracy Using Context Audio , 2020, Interspeech.
[15] Jae-Jin Jeon,et al. Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Kyunghyun Cho,et al. Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search , 2020, ACL.
[17] Xiao Chen,et al. Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition , 2020, INTERSPEECH.
[18] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Anamitra R. Choudhury,et al. PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination , 2020, ICML.
[20] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[21] Tara N. Sainath,et al. Recognizing Long-Form Speech Using Streaming End-to-End Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[22] Yifan Gong,et al. Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[23] Chuang Gan,et al. Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.
[24] Linhao Dong,et al. CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[26] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[28] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[29] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[31] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[33] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[34] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[36] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[37] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.