Pitch Estimation Based on a Harmonic Sinusoidal Autocorrelation Model and a Time-Domain Matching Scheme

In this paper, a method for the estimation of pitch from noise-corrupted speech observations based on extracting a pitch harmonic and the corresponding harmonic number is proposed. Starting from the harmonic representation of clean speech, a simple yet accurate harmonic sinusoidal autocorrelation (HSAC) model is first derived. By employing this HSAC model expressed in terms of the pitch harmonics of the clean speech, a new autocorrelation-domain least-squares fitting optimization technique is developed to extract a pitch harmonic from the noisy speech. Then, the harmonic number associated with the pitch harmonic is determined by maximizing an objective function formulated as an impulse-train weighted symmetric average magnitude sum function (SAMSF) of the noisy speech. The period of the impulse-train is governed by the estimated pitch harmonic and the maximization of the objective function is carried out through a time-domain matching of periodicity of the impulse-train with that of the SAMSF. An SAMSF-based pitch tracking scheme using dynamic programming is devised to obtain a smoothed pitch contour. In order to demonstrate the efficacy of the proposed method, simulations are conducted by considering naturally spoken speech signals in the presence of white or multi-talker babble noise at different signal-to-noise ratio (SNR) levels. A comprehensive evaluation of the pitch estimation results shows the superiority of the proposed method over some of the state-of-the-art methods under low levels of SNR.

[1]  Michael S. Scordilis,et al.  Analysis, enhancement and evaluation of five pitch determination techniques , 2002, Speech Commun..

[2]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[3]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[4]  B. Yegnanarayana,et al.  Epoch extraction of voiced speech , 1975 .

[5]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[6]  Liang Gu,et al.  Perceptual harmonic cepstral coefficients for speech recognition in noisy environment , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  DeLiang Wang,et al.  Pitch-based monaural segregation of reverberant speech. , 2006, The Journal of the Acoustical Society of America.

[9]  Masashi Unoki,et al.  A method of signal extraction from noisy signal based on auditory scene analysis , 1997, Speech Commun..

[10]  Wei-Ping Zhu,et al.  Robust pitch estimation at very low SNR exploiting time and frequency domain cues , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Douglas D. O'Shaughnessy,et al.  Automatic and reliable estimation of glottal closure instant and period , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  David A. Krubsack,et al.  An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech , 1991, IEEE Trans. Signal Process..

[13]  W. Bastiaan Kleijn,et al.  Estimation of the Instantaneous Pitch of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Masashi Unoki,et al.  Signal extraction from noisy signal based on auditory scene analysis. , 1998 .

[15]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[16]  Jianqin Zhou,et al.  On discrete cosine transform , 2011, ArXiv.

[17]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .

[18]  Douglas D. O'Shaughnessy,et al.  Speech communications - human and machine, 2nd Edition , 2000 .

[19]  Wei-Ping Zhu,et al.  A pitch extraction algorithm in noise based on temporal and spectral representations , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[21]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[22]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[23]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[24]  Chong Kwan Un,et al.  A performance comparison of pitch extraction algorithms for noisy speech , 1984, ICASSP.

[25]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[26]  Hui Li,et al.  A Pitch Detection Algorithm Based on AMDF and ACF , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[28]  S. R. Mahadeva Prasanna,et al.  Extraction of pitch in adverse conditions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Chin-Teng Lin,et al.  Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure , 2001, IEEE Trans. Speech Audio Process..

[30]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[31]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[32]  Wei-Ping Zhu,et al.  A Robust Pitch Estimation Algorithm in Noise , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[33]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[34]  Tomohiro Nakatani,et al.  Harmonic sound stream segregation using localization and its application to speech stream segregation , 1999, Speech Commun..