论文信息 - Pitch Estimation Based on a Harmonic Sinusoidal Autocorrelation Model and a Time-Domain Matching Scheme

Pitch Estimation Based on a Harmonic Sinusoidal Autocorrelation Model and a Time-Domain Matching Scheme

In this paper, a method for the estimation of pitch from noise-corrupted speech observations based on extracting a pitch harmonic and the corresponding harmonic number is proposed. Starting from the harmonic representation of clean speech, a simple yet accurate harmonic sinusoidal autocorrelation (HSAC) model is first derived. By employing this HSAC model expressed in terms of the pitch harmonics of the clean speech, a new autocorrelation-domain least-squares fitting optimization technique is developed to extract a pitch harmonic from the noisy speech. Then, the harmonic number associated with the pitch harmonic is determined by maximizing an objective function formulated as an impulse-train weighted symmetric average magnitude sum function (SAMSF) of the noisy speech. The period of the impulse-train is governed by the estimated pitch harmonic and the maximization of the objective function is carried out through a time-domain matching of periodicity of the impulse-train with that of the SAMSF. An SAMSF-based pitch tracking scheme using dynamic programming is devised to obtain a smoothed pitch contour. In order to demonstrate the efficacy of the proposed method, simulations are conducted by considering naturally spoken speech signals in the presence of white or multi-talker babble noise at different signal-to-noise ratio (SNR) levels. A comprehensive evaluation of the pitch estimation results shows the superiority of the proposed method over some of the state-of-the-art methods under low levels of SNR.

[1] Michael S. Scordilis,et al. Analysis, enhancement and evaluation of five pitch determination techniques , 2002, Speech Commun..

[2] Jae S. Lim,et al. Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[3] Lawrence R. Rabiner,et al. On the use of autocorrelation analysis for pitch detection , 1977 .

[4] B. Yegnanarayana,et al. Epoch extraction of voiced speech , 1975 .

[5] Ray Meddis,et al. Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[6] Liang Gu,et al. Perceptual harmonic cepstral coefficients for speech recognition in noisy environment , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7] Mike Brookes,et al. Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8] DeLiang Wang,et al. Pitch-based monaural segregation of reverberant speech. , 2006, The Journal of the Acoustical Society of America.

[9] Masashi Unoki,et al. A method of signal extraction from noisy signal based on auditory scene analysis , 1997, Speech Commun..

[10] Wei-Ping Zhu,et al. Robust pitch estimation at very low SNR exploiting time and frequency domain cues , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11] Douglas D. O'Shaughnessy,et al. Automatic and reliable estimation of glottal closure instant and period , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12] David A. Krubsack,et al. An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech , 1991, IEEE Trans. Signal Process..

[13] W. Bastiaan Kleijn,et al. Estimation of the Instantaneous Pitch of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14] Masashi Unoki,et al. Signal extraction from noisy signal based on auditory scene analysis. , 1998 .

[15] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[16] Jianqin Zhou,et al. On discrete cosine transform , 2011, ArXiv.

[17] M. J. Cheng,et al. Comparative performance study of several pitch detection algorithms , 1975 .

[18] Douglas D. O'Shaughnessy,et al. Speech communications - human and machine, 2nd Edition , 2000 .

[19] Wei-Ping Zhu,et al. A pitch extraction algorithm in noise based on temporal and spectral representations , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20] M. Ross,et al. Average magnitude difference function pitch extractor , 1974 .

[21] Fabrice Plante,et al. A pitch extraction reference database , 1995, EUROSPEECH.

[22] Wolfgang Hess,et al. Pitch Determination of Speech Signals , 1983 .

[23] B. Atal. Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[24] Chong Kwan Un,et al. A performance comparison of pitch extraction algorithms for noisy speech , 1984, ICASSP.

[25] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[26] Hui Li,et al. A Pitch Detection Algorithm Based on AMDF and ACF , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27] Paul C. Bagshaw,et al. Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[28] S. R. Mahadeva Prasanna,et al. Extraction of pitch in adverse conditions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29] Chin-Teng Lin,et al. Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure , 2001, IEEE Trans. Speech Audio Process..

[30] Alex Acero,et al. Spoken Language Processing , 2001 .

[31] Herman J. M. Steeneken,et al. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[32] Wei-Ping Zhu,et al. A Robust Pitch Estimation Algorithm in Noise , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[33] Hajime Kobayashi,et al. Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[34] Tomohiro Nakatani,et al. Harmonic sound stream segregation using localization and its application to speech stream segregation , 1999, Speech Commun..