Pitch Mean Based Frequency Warping

In this paper, a novel pitch mean based frequency warping (PMFW) method is proposed to reduce the pitch variability in speech signals at the front-end of speech recognition. The warp factors used in this process are calculated based on the average pitch of a speech segment. Two functions to describe the relations between the frequency warping factor and the pitch mean are defined and compared. We use a simple method to perform frequency warping in the Mel-filter bank frequencies based on different warping factors. To solve the problem of mismatch in bandwidth between the original and the warped spectra, the Mel-filters selection strategy is proposed. At last, the PMFW mel-frequency cepstral coefficient (MFCC) is extracted based on the regular MFCC with several modifications. Experimental results show that the new PMFW MFCCs are more distinctive than the regular MFCCs.

[1]  Puming Zhan,et al.  Speaker normalization based on frequency warping , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Wang Dong Multi-Layer Channel Normalization for Frequency-Dynamic Feature Extraction , 2003 .

[3]  David Gelbart,et al.  Efficient pitch-based estimation of VTLN warp factors , 2005, INTERSPEECH.

[4]  Ulrike Glavitsch Speaker normalization with respect to F0: a perceptual approach , 2003 .

[5]  Thomas Fang Zheng,et al.  Real-time Pitch Tracking Based on C , 2005 .

[6]  Hermann Ney,et al.  Vocal tract normalization as linear transformation of MFCC , 2003, INTERSPEECH.

[7]  Thomas Fang Zheng,et al.  Real-time pitch tracking based on combined SMDSF , 2005, INTERSPEECH.

[8]  Evandro B. Gouvêa,et al.  Speaker normalization through formant-based warping of the frequency scale , 1997, EUROSPEECH.

[9]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[10]  Jean Laroche,et al.  New phase-vocoder techniques for pitch-shifting, harmonizing and other exotic effects , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[11]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  Hervé Bourlard,et al.  Using pitch frequency information in speech recognition , 2003, INTERSPEECH.

[13]  Brian Kingsbury,et al.  Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Stephen A. Zahorian,et al.  Vocal tract normalization based on spectral warping , 2004, INTERSPEECH.