Subband architecture for automatic speaker recognition

Abstract We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are independently applied and then recombined to yield a global score and a global recognition decision. The choice of the subband architecture and the recombination strategies are particularly discussed. This techniques had been shown to be robust for speech recognition when a narrow band noise degradation occur. We first objectively verify this robustness for the speaker identification task. We also study which information is really used to recognize speakers. For this, speaker identification experiments on independent subbands are conducted for 630 speakers of TIMIT and NTIMIT databases. The results show that the speaker specific information is not equally distributed among subbands. In particular, the low-frequency subbands (under 600 Hz) and the high-frequency subbands (over 3000 Hz) are more speaker-specific than middle-frequency ones. In addition, experiments on different subband system arechitectures show that the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into subbands. Consequently, we propose a particularly redundant parallel architecture for which most of the correlations are kept. The performances obtained with this new system, using linear recombination strategies, are equivalent to those of a conventional fullband recognizer on clean and telephone speech. Experiments on speech corrupted by unpredictable noise show a better adaptability of this approach in noisy environments, compared to a conventional device, especially when pruning of some recognizers is performed.

[1]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Belur V. Dasarathy,et al.  Decision fusion , 1994 .

[3]  H. Moulin Axioms of Cooperative Decision Making , 1988 .

[4]  Hervé Bourlard,et al.  Subband-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Chafic Mokbel,et al.  Innovative speech processing for mobile terminals: an annotated bibliography , 2000, Signal Process..

[6]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[7]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[9]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[11]  Lou Boves,et al.  Speaker verification with GSM coded telephone speech , 1997, EUROSPEECH.

[12]  Joachim Wilke,et al.  A further investigation on AR-vector models for text-independent speaker identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Jean-François Bonastre,et al.  Frame pruning for speaker recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Jean-François Bonastre,et al.  Subband Approach for Automatic Speaker Recognition: Optimal Division of the Frequency Domain , 1997, AVBPA.

[15]  Jean-François Bonastre,et al.  Time and frequency pruning for speaker identification , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[16]  Sarel van Vuuren,et al.  Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.