Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation
暂无分享,去创建一个
Nam Soo Kim | Won Ik Cho | Jiwon Yoon | Donghyun Kwak | N. Kim | J. Yoon | Donghyun Kwak
[1] Pengwei Wang,et al. Understanding Semantics from Speech Through Pre-training , 2019, ArXiv.
[2] Zhichang Zhang,et al. A Joint Learning Framework With BERT for Spoken Language Understanding , 2019, IEEE Access.
[3] Themos Stafylakis,et al. End-to-End Architectures for ASR-Free Spoken Language Understanding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Yoshua Bengio,et al. Speaker Recognition from Raw Waveform with SincNet , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[5] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[6] Jiajun Zhang,et al. Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding , 2019, AAAI.
[7] Nam Soo Kim,et al. Disambiguating Speech Intention via Audio-Text Co-attention Framework: A Case of Prosody-semantics Interface , 2019, ArXiv.
[8] Jiajun Zhang,et al. End-to-End Speech Translation with Knowledge Distillation , 2019, INTERSPEECH.
[9] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Ho-Gyeong Kim,et al. Knowledge Distillation Using Output Errors for Self-attention End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Nam Soo Kim,et al. PROSODY-SEMANTICS INTERFACE IN SEOUL KOREAN: CORPUS FOR A DISAMBIGUATION OF WH- INTERVENTION , 2019 .
[12] Ali Can Kocabiyikoglu,et al. Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation , 2018, LREC.
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] David Suendermann-Oeft,et al. Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[15] Nam Soo Kim,et al. TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition , 2020 .
[16] Lei Li,et al. Towards Making the Most of BERT in Neural Machine Translation , 2020, AAAI.
[17] Yung-Sung Chuang,et al. SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering , 2019, ArXiv.
[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[19] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[20] Bing Liu,et al. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.
[21] Pengwei Wang,et al. Large-Scale Unsupervised Pre-Training for End-to-End Spoken Language Understanding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Nam Soo Kim,et al. Text Matters but Speech Influences: A Computational Analysis of Syntactic Ambiguity Resolution , 2019 .
[24] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[25] George R. Doddington,et al. The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.
[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[28] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[29] Boris Ginsburg,et al. Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.
[30] Arun Narayanan,et al. From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[31] Jitendra Malik,et al. Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] F. Niyi Akinnaso,et al. On The Differences Between Spoken and Written Language , 1982 .
[33] Yoshua Bengio,et al. Speech Model Pre-training for End-to-End Spoken Language Understanding , 2019, INTERSPEECH.
[34] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[35] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.