暂无分享,去创建一个
Alexei Baevski | Michael Auli | Abdelrahman Mohamed | Abdel-rahman Mohamed | Michael Auli | Alexei Baevski
[1] Chris Dyer,et al. Unsupervised Learning of Efficient and Robust Speech Representations , 2019 .
[2] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[3] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[4] Gregory Shakhnarovich,et al. Visually Grounded Learning of Keyword Prediction from Untranscribed Speech , 2017, INTERSPEECH.
[5] Georg Heigold,et al. Word embeddings for speech recognition , 2014, INTERSPEECH.
[6] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[7] Tatsuya Kawahara,et al. Semi-supervised ensemble DNN acoustic model training , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[9] Aren Jansen,et al. Unsupervised neural network based feature extraction using weak top-down constraints , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.
[11] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[12] Gabriel Synnaeve,et al. Wav2Letter++: A Fast Open-source Speech Recognition System , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Tara N. Sainath,et al. Semi-supervised Training for End-to-end Models via Weak Distillation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Kenneth Ward Church,et al. Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.
[15] Rohit Prabhavalkar,et al. On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition , 2019, INTERSPEECH.
[16] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[17] Aren Jansen,et al. Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[18] Brian Kingsbury,et al. Multilingual representations for low resource speech recognition and keyword search , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[19] Hao Tang,et al. An Unsupervised Autoregressive Model for Speech Representation Learning , 2019, INTERSPEECH.
[20] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Kenneth Ward Church,et al. A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Steve Renals,et al. Multilingual training of deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[24] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[25] Florian Metze,et al. Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the “Speaking Rosetta” JSALT 2017 Workshop , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.
[27] Lin-Shan Lee,et al. Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder , 2016, INTERSPEECH.
[28] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.
[29] Karen Livescu,et al. Multi-view Recurrent Neural Acoustic Word Embeddings , 2016, ICLR.
[30] Lukás Burget,et al. Semi-supervised training of Deep Neural Networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[31] Michael Picheny,et al. Acoustically Grounded Word Embeddings for Improved Acoustics-to-word Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[33] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .
[34] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Haizhou Li,et al. Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions , 2016, INTERSPEECH.
[36] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[37] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[38] Sree Hari Krishnan Parthasarathi,et al. Lessons from Building Acoustic Models with a Million Hours of Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Gabriel Synnaeve,et al. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.
[40] James R. Glass. Towards unsupervised speech processing , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).
[41] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[42] Iasonas Kokkinos,et al. Learning Filterbanks from Raw Speech for Phone Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] James R. Glass,et al. Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech , 2018, INTERSPEECH.
[44] Ewan Dunbar,et al. Learning Weakly Supervised Multimodal Phoneme Embeddings , 2017, INTERSPEECH.
[45] Armand Joulin,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Awni Hannun,et al. Self-Training for End-to-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Luke S. Zettlemoyer,et al. Transformers with convolutional context for ASR , 2019, ArXiv.
[48] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[49] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[50] Georg Heigold,et al. Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[51] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[52] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[53] Ngoc Thang Vu,et al. Multilingual deep neural network based acoustic modeling for rapid language adaptation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.
[55] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[56] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[57] Geoffrey Zweig,et al. Transformer-Based Acoustic Modeling for Hybrid Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[59] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.