Analysis and Selection of Prosodic Features for Language Identification

Prosodic features are relatively simple in their structures and are believed to be effective in some speech recognition tasks. However, this kind of features is subject to undesirable bias factors, such as speaking styles. To cope with this, researchers have suggested various normalization and measure methods to the features, which makes the feature inventory very large. In this paper, we use a mutual information criterion to analyze and select a number of prosody-related features in a language identification (LID) task. Among twelve optimal features, eight of them are elaborated in this paper. The feature analysis metric, z-score, is shown to have a moderate to high correlation with LID accuracies. Feature selection proposed in this paper brings about the best performance among all prosodic LID systems to our knowledge. A further attempt in system fusion shows a 13% relative improvement the prosodic LID system brings to the conventional phonotactic approach to LID.

[1]  Bayya Yegnanarayana,et al.  Extraction and representation of prosodic features for language and speaker recognition , 2008, Speech Commun..

[2]  Haizhou Li,et al.  Advances in Chinese Spoken Language Processing , 2006 .

[3]  Hsiao-Chuan Wang,et al.  Language Identification Using Pitch Contour Information in the Ergodic Markov Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Andreas Stolcke,et al.  Modeling prosodic feature sequences for speaker recognition , 2005, Speech Commun..

[6]  Susanne Burger,et al.  Syllable detection in read and spontaneous speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[8]  Tan Lee,et al.  Entropy-Based Analysis of the Prosodic Features of Chinese Dialects , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.

[9]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[10]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[11]  Hsiao-Chuan Wang,et al.  Language identification using pitch contour information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  F. Ramus,et al.  Language identification with suprasegmental cues: a study based on speech resynthesis. , 1999, The Journal of the Acoustical Society of America.

[13]  Jean-Luc Rouas Automatic Prosodic Variations Modeling for Language and Dialect Discrimination , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Tan Lee,et al.  TONE MODELING FOR SPEECH RECOGNITION , 2006 .

[15]  Hiroya Fujisaki,et al.  Information, prosody, and modeling - with emphasis on tonal features of speech - , 2004, Speech Prosody 2004.

[16]  Jérôme Farinas,et al.  Modeling prosody for language identification on read and spontaneous speech , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).