Robust and accurate fundamental frequency estimation based on dominant harmonic components.

This paper presents a new method for robust and accurate fundamental frequency (F0) estimation in the presence of background noise and spectral distortion. Degree of dominance and dominance spectrum are defined based on instantaneous frequencies. The degree of dominance allows one to evaluate the magnitude of individual harmonic components of the speech signals relative to background noise while reducing the influence of spectral distortion. The fundamental frequency is more accurately estimated from reliable harmonic components which are easy to select given the dominance spectra. Experiments are performed using white and babble background noise with and without spectral distortion as produced by a SRAEN filter. The results show that the present method is better than previously reported methods in terms of both gross and fine F0 errors.

[1]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[2]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .

[3]  Francis Charpentier,et al.  Pitch detection using the short-term phase spectrum , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[5]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[6]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[7]  T. Abe,et al.  The IF Spectrogram : A New Spectral Representation , 1997 .

[8]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[9]  Tomohiro Nakatani,et al.  Harmonic sound stream segregation using localization and its application to speech stream segregation , 1999, Speech Commun..

[10]  Satoshi Nakamura,et al.  Robust fundamental frequency estimation using instantaneous frequencies of harmonic components , 2000, INTERSPEECH.

[11]  Stephanie Seneff,et al.  Robust pitch tracking for prosodic modeling in telephone speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[13]  Chin-Teng Lin,et al.  Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure , 2001, IEEE Trans. Speech Audio Process..

[14]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[15]  Stephen A. Zahorian,et al.  Yet Another Algorithm for Pitch Tracking , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.