论文信息 - An Improved LSTM For Language Identification

An Improved LSTM For Language Identification

In this paper, we propose a novel framework by combining the phonetic temporal neural model (PTN) with an improved LSTM (IM-LSTM). This is achieved by using an up-down connection from the time t to t+1 in the LSTM structure, which aims to capture the latent information from the previous time step. This updated structure can perform better to discriminate the frame-level phonetic information produced by PTN. On the AP16-OLR language identification dataset, our final model achieves relative growth rate 5.04%, 2.19%, 2.73% on EER and 6.55%, 5.81%, 2.23% on Cavg in 1s, 3s and full-length utterance condition than the standard PTN, respectively. The proposed framework receives a better performance than the standard PTN and other proposed models, particularly in 1s condition. This shows the efficacy and flexibility of the proposed method.

Liqiang Zhang | Xiang Xie | Hui Deng | Qingran Zhan

[1] Dong Wang,et al. AP16-OL7: A multilingual database for oriental languages and a language recognition baseline , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[2] Joaquín González-Rodríguez,et al. Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Dong Wang,et al. Phonetic Temporal Neural Model for Language Identification , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4] William M. Campbell,et al. Language recognition with support vector machines , 2004, Odyssey.

[5] J. Foil,et al. Language identification using noisy speech , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Bin Ma,et al. The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS , 2016, INTERSPEECH.

[7] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[8] Man-Hung Siu,et al. Automatic language identification using discrete hidden Markov model , 2004, INTERSPEECH.

[9] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[10] Dong Wang,et al. AP17-OLR challenge: Data, plan, and baseline , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[11] Russell B. Ives,et al. Development of an automatic identification system of spoken languages: Phase I , 1982, ICASSP.

[12] S. Papson. “Model” , 1981 .

[13] Seiichi Nakagawa,et al. Speaker-independent, text-independent language identification by HMM , 1992, ICSLP.

[14] Douglas A. Reynolds,et al. Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[15] Douglas A. Reynolds,et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[16] Marc A. Zissman,et al. Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] Jérôme Farinas,et al. Modeling prosody for language identification on read and spontaneous speech , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[18] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[19] Xiaohui Zhang,et al. Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.