Improving speaker identification in noise by subband processing and decision fusion

We investigate speaker identification in narrowband noise using subband processing. The output of each subband is used to train and test individual hidden Markov models (HMMs), each making a preliminary decision on speaker identity. Subsequently, these are combined to produce a final decision. For sufficient numbers of filters, subband processing outperforms traditional wideband techniques by an enormous margin.

[1]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[2]  Tony J. Dodd,et al.  Improving speaker identification by trainable data fusion and subband processing techniques , 2002 .

[3]  Hervé Bourlard,et al.  The full combination sub-bands approach to noise robust HMM/ANN based ASR , 1999, EUROSPEECH.

[4]  Tony J. Dodd,et al.  Application of Multiple Classifier Techniques to Subband Speaker Identification with an HMM/ANN System , 2001, Multiple Classifier Systems.

[5]  Kate Knill,et al.  Hidden Markov Models in Speech and Language Processing , 1997 .

[6]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[7]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Belur V. Dasarathy,et al.  Decision fusion , 1994 .

[9]  W. Yost Auditory Perception: A New Analysis and Synthesis , 1999, Trends in Neurosciences.

[10]  Steve Young,et al.  Corpus-based methods in language and speech processing , 1997 .

[11]  Tony J. Dodd,et al.  Information fusion for subband-HMM speaker recognition , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[12]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[13]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[15]  Jean-François Bonastre,et al.  Subband architecture for automatic speaker recognition , 2000, Signal Process..

[16]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[17]  Jean-François Bonastre,et al.  Subband Approach for Automatic Speaker Recognition: Optimal Division of the Frequency Domain , 1997, AVBPA.

[18]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Robert I. Damper,et al.  Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing , 2001, Int. J. Speech Technol..

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  S. S. Stevens,et al.  The Relation of Pitch to Frequency: A Revised Scale , 1940 .

[22]  B. Moore Frequency Selectivity in Hearing , 1987 .

[23]  Steve Young,et al.  The HTK book , 1995 .