Speaker normalization based on subglottal resonances

Speaker normalization typically focuses on variabilities of the supra-glottal (vocal tract) resonances, which constitute a major cause of spectral mismatch. Recent studies show that the subglottal airways also affect spectral properties of speech sounds. This paper presents a speaker normalization method based on estimating the second and third subglottal resonances. Since the subglottal airways do not change for a specific speaker, the subglottal resonances are independent of the sound type (i.e., vowel, consonant, etc.) and remain constant for a given speaker. This context-free property makes the proposed method suitable for limited data speaker adaptation. This method is computationally more efficient than maximum-likelihood based VTLN, with performance better than VTLN especially for limited adaptation data. Experimental results confirm that this method performs well in a variety of testing conditions and tasks.

[1]  William J. Byrne,et al.  Speaker adaptation with all-pass transforms , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Abeer Alwan,et al.  Adaptation of children's speech with limited data based on formant-like peak alignment , 2006, Comput. Speech Lang..

[3]  Evandro B. Gouvêa,et al.  Speaker normalization through formant-based warping of the frequency scale , 1997, EUROSPEECH.

[4]  Hermann Ney,et al.  Vocal tract normalization as linear transformation of MFCC , 2003, INTERSPEECH.

[5]  Steven M. Lulich,et al.  A role for the second subglottal resonance in lexical access. , 2007, The Journal of the Acoustical Society of America.

[6]  Hermann Ney,et al.  Implementing frequency-warping and VTLN through linear transformation of conventional MFCC , 2005, INTERSPEECH.

[7]  Louis ten Bosch,et al.  A novel feature transformation for vocal tract length normalization in automatic speech recognition , 1998, IEEE Trans. Speech Audio Process..

[8]  Morgan Sonderegger Subglottal coupling and vowel space: an investigation in quantal theory. , 2004 .

[9]  Steven M. Lulich,et al.  The role of lower airway resonances in defining vowel feature contrasts. , 2006 .

[10]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[11]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[12]  S. Wegmann,et al.  Speaker normalization on conversational telephone speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Xuemin Chi,et al.  Subglottal coupling and its influence on vowel formants. , 2007, The Journal of the Acoustical Society of America.