SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spoken Question Answering
暂无分享,去创建一个
Lin-Shan Lee | Hung-yi Lee | Yung-Sung Chuang | Chi-Liang Liu | Hung-yi Lee | Lin-Shan Lee | Yung-Sung Chuang | Chi-Liang Liu
[1] James R. Glass,et al. Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces , 2018, NeurIPS.
[2] Hung-yi Lee,et al. Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[4] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.
[5] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.
[6] Lin-Shan Lee,et al. Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder , 2016, INTERSPEECH.
[7] Arun Narayanan,et al. From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[8] James R. Glass,et al. Spoken Content Retrieval—Beyond Cascading Speech Recognition with Text Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Ming Zhou,et al. Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.
[10] Lin-Shan Lee,et al. Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification , 2017, INTERSPEECH.
[11] Yuzong Liu,et al. Deep Contextualized Acoustic Representations for Semi-Supervised Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[13] Guangsen Wang,et al. Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks , 2020, INTERSPEECH.
[14] Lin-Shan Lee,et al. Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Yelong Shen,et al. FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension , 2017, ICLR.
[16] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[17] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[18] Yuxing Peng,et al. Reinforced Mnemonic Reader for Machine Comprehension , 2017 .
[19] Themos Stafylakis,et al. End-to-End Architectures for ASR-Free Spoken Language Understanding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Hung-yi Lee,et al. Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension , 2018, INTERSPEECH.
[21] Yoshua Bengio,et al. Speech Model Pre-training for End-to-End Spoken Language Understanding , 2019, INTERSPEECH.
[22] Shiyu Zhou,et al. Unsupervised pre-traing for sequence to sequence speech recognition , 2019, ArXiv.
[23] Brian Kingsbury,et al. Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[25] Lin-Shan Lee,et al. Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data , 2018, ArXiv.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] James R. Glass,et al. Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech , 2018, INTERSPEECH.
[28] Xiangang Li,et al. Improving Transformer-based Speech Recognition Using Unsupervised Pre-training , 2019, ArXiv.
[29] Olivier Pietquin,et al. Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation , 2016, NIPS 2016.
[30] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[31] Ryan Price. End-To-End Spoken Language Understanding Without Matched Language Speech Model Pretraining Data , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[33] Srinivas Bangalore,et al. Spoken Language Understanding without Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[35] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[36] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[37] Yongqiang Wang,et al. Towards End-to-end Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[39] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[40] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.