Noise-robust F0 estimation using SNR-weighted summary correlograms from multi-band comb filters

A noise-robust, signal-to-noise ratio (SNR)-weighted correlogrambased pitch estimation algorithm (PEA) in which a bank of comb filters operates in each of the low, mid, and high frequency bands is proposed. Correlograms are obtained by applying autocorrelations directly on the low-freq filterbank (FBK) output, and the output envelopes of all 3 FBKs. An SNR-weighting scheme is used for channel selection to yield a summary correlogram for each FBK. These summary correlograms are averaged to obtain an overall summary correlogram, which is time-smoothed before peak extraction is performed. The final pitch contour is obtained via dynamic programming. The proposed PEA is evaluated on the Keele corpus with additive white or babble noises. In comparison with widely-used PEAs, the proposed PEA has the lowest overall gross pitch error (GPE), especially in low SNR cases.

[1]  Ning Ma,et al.  Exploiting correlogram structure for robust speech recognition with multiple speech sources , 2007, Speech Commun..

[2]  Philippe Martin Comparison of pitch detection by cepstrum and spectral comb analysis , 1982, ICASSP.

[3]  J. Licklider,et al.  A duplex theory of pitch perception , 1951, Experientia.

[4]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[5]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[6]  VargaAndrew,et al.  Assessment for automatic speech recognition II , 1993 .

[7]  Peng Li,et al.  Multipitch Detection Based on Weighted Summary Correlogram , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.

[8]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[9]  Jean Rouat,et al.  A pitch determination and voiced/unvoiced decision algorithm for noisy speech , 1995, Speech Commun..

[10]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[11]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Arye Nehorai,et al.  Adaptive comb filtering for harmonic signal enhancement , 1986, IEEE Trans. Acoust. Speech Signal Process..

[13]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[14]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[15]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[17]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[18]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[19]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[20]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .