Robust Estimation of Fundamental Frequency Using Single Frequency Filtering Approach

A new method for robust estimation of fundamental frequency (F0) from speech signal is proposed in this paper. The method exploits the high SNR regions of speech in time and frequency domains in the outputs of single frequency filtering (SFF) of speech signal. The high resolution in the frequency domain brings out the harmonic characteristics of speech clearly. The harmonic spacing in the high SNR regions of spectrum determine the F0. The concept of root cepstrum is used to reduce the effects of vocal tract resonances in the F0 estimation. The proposed method is evaluated for clean speech and noisy speech simulated for 15 different degradations at different noise levels. Performance of the proposed method is compared with four other standard methods of F0 extraction. From the results it is evident that the proposed method is robust for most types of degradations.

[1]  Bayya Yegnanarayana,et al.  Single Frequency Filtering Approach for Discriminating Speech and Nonspeech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[3]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[4]  Lawrence K. Saul,et al.  Multiband statistical learning for f/sub 0/ estimation in speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Abeer Alwan,et al.  SAFE: A Statistical Approach to F0 Estimation Under Clean and Noisy Conditions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[7]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[8]  T. Irino,et al.  Robust and accurate fundamental frequency estimation based on dominant harmonic components. , 2004, The Journal of the Acoustical Society of America.

[9]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[10]  Alain de Cheveigné,et al.  Speech f0 extraction based on Licklider's pitch perception model , 1991 .

[11]  Wendi B. Heinzelman,et al.  BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Kavita Kasi,et al.  YET ANOTHER ALGORITHM FOR PITCH TRACKING (YAAPT) by , 2002 .

[13]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[14]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[15]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[16]  Yih-Ru Wang,et al.  A statistical pitch detection algorithm , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Leah H. Jamieson,et al.  A probabilistic approach to AMDF pitch detection , 1994, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Xuejing Sun,et al.  Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Shlomo Dubnov,et al.  Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model , 2004, IEEE Transactions on Speech and Audio Processing.

[20]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[21]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  J. C. Williams,et al.  Noh voice quality , 2009, Logopedics, phoniatrics, vocology.

[23]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[24]  Vinay Kumar Mittal,et al.  Study of characteristics of aperiodicity in Noh voices. , 2015, The Journal of the Acoustical Society of America.

[25]  Abeer Alwan,et al.  Multi-band summary correlogram-based pitch detection for noisy speech , 2013, Speech Commun..

[26]  Hema A. Murthy,et al.  Minimum phase signal derived from root cepstrum , 2003 .

[27]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.