Modulation Spectral Features for Robust Far-Field Speaker Identification

In this paper, auditory inspired modulation spectral features are used to improve automatic speaker identification (ASI) performance in the presence of room reverberation. The modulation spectral signal representation is obtained by first filtering the speech signal with a 23-channel gammatone filterbank. An eight-channel modulation filterbank is then applied to the temporal envelope of each gammatone filter output. Features are extracted from modulation frequency bands ranging from 3-15 H z and are shown to be robust to mismatch between training and testing conditions and to increasing reverberation levels. To demonstrate the gains obtained with the proposed features, experiments are performed with clean speech, artificially generated reverberant speech, and reverberant speech recorded in a meeting room. Simulation results show that a Gaussian mixture model based ASI system, trained on the proposed features, consistently outperforms a baseline system trained on mel-frequency cepstral coefficients. For multimicrophone ASI applications, three multichannel score combination and adaptive channel selection techniques are investigated and shown to further improve ASI performance.

[1]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[2]  N. Morgan,et al.  Pushing the envelope - aside [speech recognition] , 2005, IEEE Signal Processing Magazine.

[3]  Douglas L. Jones,et al.  Blind estimation of reverberation time. , 2003, The Journal of the Acoustical Society of America.

[4]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[5]  Tiago H. Falk,et al.  A NON-INTRUSIVE QUALITY MEASURE OF DEREVERBERATED SPEECH , 2008 .

[6]  Javier Ortega-Garcia,et al.  Overview of speech enhancement techniques for automatic speaker recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Marc Moonen,et al.  Subspace Methods for Multimicrophone Speech Dereverberation , 2003, EURASIP J. Adv. Signal Process..

[8]  Phillip L. De Leon,et al.  Compensation for room reverberation in speaker identification , 2008, 2008 16th European Signal Processing Conference.

[9]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[10]  M. Schroeder New Method of Measuring Reverberation Time , 1965 .

[11]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Tiago H. Falk,et al.  Temporal Dynamics for Blind Measurement of Room Acoustical Parameters , 2010, IEEE Transactions on Instrumentation and Measurement.

[13]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[14]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Hua Yuan,et al.  Spectro-temporal processing for blind estimation of reverberation time and single-ended quality measurement of reverberant speech , 2007, INTERSPEECH.

[16]  Benoît Champagne,et al.  A microphone array processing technique for speech enhancement in a reverberant space , 1996, Speech Communication.

[17]  Marc Moonen,et al.  Multimicrophone Speech Dereverberation: Experimental Validation , 2007, EURASIP J. Audio Speech Music. Process..

[18]  Alexander H. Waibel,et al.  The effects of room acoustics on MFCC speech parameter , 2000, INTERSPEECH.

[19]  Hans-Günter Hirsch,et al.  The simulation of realistic acoustic input scenarios for speech recognition systems , 2005, INTERSPEECH.

[20]  Ea-Ee Jan,et al.  Microphone arrays and speaker identification , 1994, IEEE Trans. Speech Audio Process..

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[24]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Tanja Schultz,et al.  Far-Field Speaker Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[27]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[28]  P.L. De Leon,et al.  Speaker Identification in the Presence of Room Reverberation , 2007, 2007 Biometrics Symposium.

[29]  Juraj Simko,et al.  The CHAINS corpus: CHAracterizing INdividual Speakers , 2006 .

[30]  David R. Cole,et al.  Speaker recognition in reverberant enclosures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[32]  R.A. Goubran,et al.  Talker Identification Using Reverberation Sensing System , 2007, 2007 IEEE Sensors.

[33]  R.A. Goubran,et al.  Combating Reverberation in Speaker Verification , 2005, 2005 IEEE Instrumentationand Measurement Technology Conference Proceedings.

[34]  Javier Ortega-Garcia,et al.  Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[35]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[36]  Robert B. Newman,et al.  Collected Papers on Acoustics , 1927 .

[37]  Tiago H. Falk,et al.  Spectro-temporal features for robust far-field speaker identification , 2008, INTERSPEECH.

[38]  Yuji Murahara,et al.  Modulation enhancement of speech as a preprocessing for reverberant chambers with the hearing-impaired , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..