Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction
暂无分享,去创建一个
Alexander H. Liu | Shun-Po Chuang | Tzu-Wei Sung | Hung-yi Lee | Hung-yi Lee | Shun-Po Chuang | Tzu-Wei Sung
[1] Matthias Sperber,et al. Speech Translation and the End-to-End Promise: Taking Stock of Where We Are , 2020, ACL.
[2] Lin-Shan Lee,et al. Towards End-to-end Speech-to-text Translation with Two-pass Decoding , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Sathish Reddy Indurthi,et al. End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Yulia Tsvetkov,et al. Problems With Evaluation of Word Embeddings Using Word Similarity Tasks , 2016, RepEval@ACL.
[6] Kevin Duh,et al. Multilingual End-to-End Speech Translation , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[7] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[8] Lin-Shan Lee,et al. Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[10] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Ali Can Kocabiyikoglu,et al. Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation , 2018, LREC.
[12] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[13] Matt Post,et al. Improved speech-to-text translation with the Fisher and Callhome Spanish-English speech translation corpus , 2013, IWSLT.
[14] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[15] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.
[16] Michael Picheny,et al. Acoustically Grounded Word Embeddings for Improved Acoustics-to-word Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Jiajun Zhang,et al. End-to-End Speech Translation with Knowledge Distillation , 2019, INTERSPEECH.
[18] Satoshi Nakamura,et al. Training Neural Machine Translation using Word Embedding-based Loss , 2018, ArXiv.
[19] Shinji Watanabe,et al. Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[20] Brian Kingsbury,et al. Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Florian Metze,et al. Learned in Speech Recognition: Contextual Acoustic Word Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Elizabeth Salesky,et al. Phone Features Improve Speech Translation , 2020, ACL.
[23] Benjamin Lecouteux,et al. Better Evaluation of ASR in Speech Translation Context Using Word Embeddings , 2016, INTERSPEECH.
[24] Bo Xu,et al. Max Margin Cosine Loss for Speaker Identification on Short Utterances , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[25] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[26] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] David Chiang,et al. An Attentional Model for Speech Translation Without Transcription , 2016, NAACL.
[28] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[29] Anders Søgaard,et al. On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.
[30] Elizabeth Salesky,et al. Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation , 2019, ACL.
[31] Navdeep Jaitly,et al. Towards Better Decoding and Language Model Integration in Sequence to Sequence Models , 2016, INTERSPEECH.
[32] Stefan Riezler,et al. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.
[33] Andrew L. Maas,et al. Word-level Acoustic Modeling with Convolutional Vector Regression , 2012 .
[34] David Chiang,et al. Tied Multitask Learning for Neural Speech Translation , 2018, NAACL.
[35] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[36] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Alex Bewley,et al. Deep Cosine Metric Learning for Person Re-identification , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[38] Navdeep Jaitly,et al. Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.
[39] Matteo Negri,et al. One-to-Many Multilingual End-to-End Speech Translation , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[40] Jian Cheng,et al. Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.
[41] Yuan Cao,et al. Leveraging Weakly Supervised Data to Improve End-to-end Speech-to-text Translation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Yongqiang Wang,et al. Towards End-to-end Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[44] Olivier Pietquin,et al. End-to-End Automatic Speech Translation of Audiobooks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[46] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[47] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[48] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Karen Livescu,et al. Deep convolutional acoustic word embeddings using word-pair side information , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] Melvin Johnson,et al. Direct speech-to-speech translation with a sequence-to-sequence model , 2019, INTERSPEECH.
[51] Ramón Fernández Astudillo,et al. Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text , 2019, INTERSPEECH.
[52] Karen Livescu,et al. Discriminative acoustic word embeddings: Tecurrent neural network-based approaches , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[53] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[55] Satoshi Nakamura,et al. Using Spoken Word Posterior Features in Neural Machine Translation , 2018 .
[56] Florian Metze,et al. Sequence-Based Multi-Lingual Low Resource Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[57] Yu Zhang,et al. Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM , 2017, INTERSPEECH.
[58] Aren Jansen,et al. Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[59] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[60] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[61] Adam Lopez,et al. Pre-training on high-resource speech recognition improves low-resource speech-to-text translation , 2018, NAACL.
[62] Tomoki Toda,et al. Back-Translation-Style Data Augmentation for end-to-end ASR , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[63] Yulia Tsvetkov,et al. Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs , 2018, ICLR.
[64] Li Deng,et al. Why word error rate is not a good metric for speech recognizer training for the speech translation task? , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[65] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[66] F. Jelinek,et al. Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.
[67] Matthias Sperber,et al. Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation , 2019, TACL.
[68] Brian Kingsbury,et al. End-to-end ASR-free keyword search from speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[69] Massimo Piccardi,et al. ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems , 2019, NAACL.
[70] Benjamin Heinzerling,et al. BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages , 2017, LREC.
[71] Satoshi Nakamura,et al. Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation , 2017, INTERSPEECH.
[72] Satoshi Nakamura,et al. End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[73] Olivier Pietquin,et al. Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation , 2016, NIPS 2016.
[74] Gokhan Tur,et al. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .
[75] Tara N. Sainath,et al. Query-by-example keyword spotting using long short-term memory networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[76] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[77] James R. Glass,et al. Towards Unsupervised Speech-to-text Translation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).