Low-Resource Language Identification From Speech Using Transfer Learning

Identification of low-resource data is a traditionally difficult machine learning problem, since the sparsity of available resources prevents classifiers from being adequately trained. An effective way to address the inevitable data sparsity in certain applications, such as in low-resource speech language identification, is transfer learning, which uses the knowledge learned from tasks with large labeled data in settings of limited data. Motivated by the fact that various languages share common phonetic and phonotactic characteristics, we explore transfer learning systems that employ various neural network architectures. We leverage readily available large datasets for creating robust instantiations of language identification models using feed-forward neural networks. These are further fine-tuned on the low-resource data from a target domain to improve the system performance. We apply the proposed approach to the automatic identification of African languages, which comprises a challenging task due to the low-resource data from such languages. We conduct our experiments using two publicly available datasets: the VoxForge corpus which contains 7 Indo-European languages as source data, and the Lwazi corpus which includes 11 African languages as target data. Our results indicate the effectiveness of transfer learning for the identification of low-resource languages from speech signals.

[1]  Christian A. Müller,et al.  Multilingual speaker age recognition: Regression analyses on the Lwazi corpus , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[2]  Christoph Meinel,et al.  Language Identification Using Deep Convolutional Recurrent Neural Networks , 2017, ICONIP.

[3]  Alon Lavie,et al.  Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario , 2003, TALIP.

[4]  K. Sreenivasa Rao,et al.  Language Identification Using Prosodic Features , 2015 .

[5]  Suman K. Mitra,et al.  Spoken Language Identification for Indian Languages Using Split and Merge EM Algorithm , 2007, PReMI.

[6]  Marc A. Zissman,et al.  Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Bhuvana Ramabhadran,et al.  End-to-end speech recognition and keyword search on low-resource languages , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Spyridon Matsoukas,et al.  Patrol Team Language Identification System for DARPA RATS P1 Evaluation , 2012, INTERSPEECH.

[9]  Alon Lavie,et al.  MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules , 2002, Machine Translation.

[10]  Benjamin Philip King,et al.  Practical Natural Language Processing for Low-Resource Languages , 2015 .

[11]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[13]  Lirong Dai,et al.  Deep Bottleneck Features for Spoken Language Identification , 2014, PloS one.

[14]  James R. Glass,et al.  Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Joaquín González-Rodríguez,et al.  On the use of deep feedforward neural networks for automatic language identification , 2016, Comput. Speech Lang..

[16]  Sanjeev Khudanpur,et al.  Spoken Language Recognition using X-vectors , 2018, Odyssey.

[17]  Neeta Tripathi,et al.  A Comparative Study on Feature Extraction Techniques for Language Identification , 2014 .

[18]  D. J. Mashao Language identification system for South African languages , 1998, Proceedings of the 1998 South African Symposium on Communications and Signal Processing-COMSIG '98 (Cat. No. 98EX214).

[19]  Li-Rong Dai,et al.  Improved language identification using deep bottleneck network , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Heng Ji,et al.  Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging , 2017, IJCNLP.

[21]  Joaquín González-Rodríguez,et al.  Frame-by-frame language identification in short utterances using deep neural networks , 2015, Neural Networks.

[22]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Fang Chen,et al.  Combining Cepstral and Prosodic Features in Language Identification , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[24]  Thomas Niesler,et al.  Language identification and multilingual speech recognition using discriminatively trained acoustic models , 2006 .

[25]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[26]  Sonja A. Kotz,et al.  The development of cross-cultural recognition of vocal emotion during childhood and adolescence , 2018, Scientific Reports.

[27]  Adam Lopez,et al.  Low-Resource Speech-to-Text Translation , 2018, INTERSPEECH.

[28]  Joaquín González-Rodríguez,et al.  Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Laurent Besacier,et al.  Automatic Speech Recognition for African Languages with Vowel Length Contrast , 2016, SLTU.