论文信息 - A phone-based approach to non-linguistic speech feature identification

A phone-based approach to non-linguistic speech feature identification

Abstract In this paper we present a general approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. The basic idea is to process the unknown speech signal by feature-specific phone model sets in parallel, and to hypothesize the feature value associated with the model set having the highest likelihood. This technique is shown to be effective for text-independent gender, speaker and language identification. Text-independent speaker identification accuracies of 98·8% on TIMIT (168 speakers) and 99·2% on BREF (65 speakers), were obtained with one utterance per speaker, and 100% with two utterances for both corpora. Experiments in which speaker-specific models were estimated without using the phonetic transcriptions for the TIMIT speakers had the same identification accuracies as those obtained with the use of the transcriptions. French/English language identification is better than 99% with 2 s of read, laboratory speech. For spontaneous telephone speech from the OGI corpus, the language can be identified as French or English with 82% accuracy with 10 s of speech. The ten language identification rate using the OGI corpus was 59·7% with 10 s of signal.

Jean-Luc Gauvain | Lori Lamel | J. Gauvain | L. Lamel

[1] G.R. Doddington,et al. Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[2] Chin-Hui Lee,et al. Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models , 1991, HLT.

[3] Marc A. Zissman,et al. Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] F. J. Goodman,et al. Improved automatic language identification in noisy speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5] Jean-Luc Gauvain,et al. Speaker-Independent Phone Recognition Using BREF , 1992, HLT.

[6] T. J. Edwards,et al. Statistical models for automatic language identification , 1980, ICASSP.

[7] Ronald A. Cole,et al. The OGI multi-language telephone speech corpus , 1992, ICSLP.

[8] Chin-Hui Lee,et al. Bayesian learning for hidden Markov model with Gaussian mixture state observation densities , 1991, Speech Commun..

[9] L. R. Rabiner,et al. Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[10] J. Foil,et al. Language identification using noisy speech , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Jean-Luc Gauvain,et al. Continuous Speech Recognition at LIMSI , 1992 .

[12] Sadaoki Furui,et al. Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Chin-Hui Lee,et al. MAP Estimation of Continuous Density HMM : Theory and Applications , 1992, HLT.

[14] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[15] Seiichi Nakagawa,et al. Speaker-independent, text-independent language identification by HMM , 1992, ICSLP.

[16] J.M. Naik,et al. Speaker verification: a tutorial , 1990, IEEE Communications Magazine.

[17] Maxine Eskénazi,et al. Design considerations and text selection for BREF, a large French read-speech corpus , 1990, ICSLP.

[18] M. Eskenazi,et al. The French language database: Defining, planning, and recording a large database , 1984, ICASSP.

[19] Ronald A. Cole,et al. Automatic segmentation and identification of ten languages using telephone speech , 1992, ICSLP.

[20] Sara H. Basson,et al. NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21] Sadaoki Furui,et al. Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22] Stephen A. Zahorian,et al. Text-independent talker identification with neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[23] A.E. Rosenberg,et al. Automatic speaker verification: A review , 1976, Proceedings of the IEEE.

[24] Russell B. Ives,et al. Development of an automatic identification system of spoken languages: Phase I , 1982, ICASSP.

[25] Hsiao-Wuen Hon,et al. Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[26] Aaron E. Rosenberg,et al. Sub-word unit talker verification using hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[27] A. House,et al. Toward automatic identification of the language of an utterance. I. Preliminary methodological con , 1977 .

[28] Maxine Eskénazi,et al. BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[29] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[30] J. W. Fussell. Automatic sex identification from short segments of speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[31] Jean-Luc Gauvain,et al. High performance speaker-independent phone recognition using CDHMM , 1993, EUROSPEECH.

[32] Jean-Luc Gauvain,et al. Identifying non-linguistic speech features , 1993, EUROSPEECH.

[33] M. Sugiyama,et al. Automatic language recognition using acoustic features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[34] B.S. Atal,et al. Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[35] A. B. Poritz,et al. Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[36] Claude Montacié,et al. AR-vector models for free-text speaker recognition , 1992, ICSLP.

[37] Jean-Luc Gauvain,et al. Cross-lingual experiments with phone recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38] Naftali Z. Tisby. On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[39] Younès Bennani. Speaker identification through a modular connectionist architecture: evaluation on the timit database , 1992, ICSLP.

[40] Jean-Luc Gauvain,et al. Identification of Non-Linguistic Speech Features , 1993, HLT.

[41] Mei-Yuh Hwang,et al. Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition , 1990, HLT.

[42] Douglas A. Reynolds,et al. Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[43] Ke Wu,et al. Automatic recognition of gender by voice , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.