Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets
暂无分享,去创建一个
[1] Alvin F. Martin,et al. The 2011 NIST Language Recognition Evaluation , 2010, INTERSPEECH.
[2] Shugong Xu,et al. Two-stage Training for Chinese Dialect Recognition , 2019, INTERSPEECH.
[3] Radek Safarík,et al. Using Deep Neural Networks for Identification of Slavic Languages from Acoustic Signal , 2018, INTERSPEECH.
[4] Dong Wang,et al. Phonetic Temporal Neural Model for Language Identification , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[5] James R. Glass,et al. Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition , 2018, Odyssey.
[6] Priyam Jain,et al. Study on the Effect of Emotional Speech on Language Identification , 2020, 2020 National Conference on Communications (NCC).
[7] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Jean-Luc Gauvain,et al. Spoken Language Identification Using LSTM-Based Angular Proximity , 2017, INTERSPEECH.
[9] Alan McCree,et al. Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17 , 2018, Odyssey.
[10] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[11] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[12] Aku Rouhe,et al. Spherediar: An Effective Speaker Diarization System for Meeting Data , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[13] Sergey Ioffe,et al. Probabilistic Linear Discriminant Analysis , 2006, ECCV.
[14] Shuai Wang,et al. What Does the Speaker Embedding Encode? , 2017, INTERSPEECH.
[15] Lin Li,et al. Phone-Aware Multi-task Learning and Length Expanding for Short-Duration Language Recognition , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[16] Abualsoud Hanani,et al. Spoken Arabic dialect recognition using X-vectors , 2020 .
[17] Dirk Van Compernolle,et al. Increasing the robustness of CNN acoustic models using autoregressive moving average spectrogram features and channel dropout , 2017, Pattern Recognit. Lett..
[18] Alan McCree,et al. The JHU-MIT System Description for NIST SRE18 , 2019 .
[19] Sanjeev Khudanpur,et al. Spoken Language Recognition using X-vectors , 2018, Odyssey.
[20] James R. Glass,et al. Automatic Dialect Detection in Arabic Broadcast Speech , 2015, INTERSPEECH.
[21] Dong Wang,et al. AP19-OLR Challenge: Three Tasks and Their Baselines , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[22] Yonghong Yan,et al. A New Time-Frequency Attention Mechanism for TDNN and CNN-LSTM-TDNN, with Application to Language Identification , 2019, INTERSPEECH.
[23] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[26] Jun Guo,et al. Short Utterance Based Speech Language Identification in Intelligent Vehicles With Time-Scale Modifications and Deep Bottleneck Features , 2019, IEEE Transactions on Vehicular Technology.
[27] Titouan Parcollet,et al. The Pytorch-kaldi Speech Recognition Toolkit , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Quan Wang,et al. Tuplemax Loss for Language Identification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Bin Ma,et al. Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.
[30] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[31] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[32] Ming Li,et al. On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Ji Gao,et al. DKU-Tencent Submission to Oriental Language Recognition AP18-OLR Challenge , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).