PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise

We present PEFAC, a fundamental frequency estimation algorithm for speech that is able to identify voiced frames and estimate pitch reliably even at negative signal-to-noise ratios. The algorithm combines a normalization stage, to remove channel dependency and to attenuate strong noise components, with a harmonic summing filter applied in the log-frequency power spectral domain, the impulse response of which is chosen to sum the energy of the fundamental frequency harmonics while attenuating smoothly-varying noise components. Temporal continuity constraints are applied to the selected pitch candidates and a voiced speech probability is computed from the likelihood ratio of two classifiers, one for voiced speech and one for unvoiced speech/silence. We compare the performance of our algorithm with that of other widely used algorithms and demonstrate that it performs well in both high and low levels of additive noise.

[1]  Patrick A. Naylor,et al.  Evaluation of pitch estimation in noisy speech for application in non-intrusive speech quality assessment , 2009, 2009 17th European Signal Processing Conference.

[2]  Hirokazu Kameoka,et al.  Single and Multiple F0 Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments , 2007, IEEE Trans. Speech Audio Process..

[3]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[4]  Jont B. Allen,et al.  Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .

[5]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Mike Brookes,et al.  A Pitch Estimation Filter robust to high levels of noise (PEFAC) , 2011, 2011 19th European Signal Processing Conference.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Philippe Martin Comparison of pitch detection by cepstrum and spectral comb analysis , 1982, ICASSP.

[9]  H. Dillon,et al.  An international comparison of long‐term average speech spectra , 1994 .

[10]  M. Schroeder Period histogram and product spectrum: new methods for fundamental-frequency measurement. , 1968, The Journal of the Acoustical Society of America.

[11]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[12]  Jean Rouat,et al.  A pitch determination and voiced/unvoiced decision algorithm for noisy speech , 1995, Speech Commun..

[13]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[14]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[15]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[16]  Mads Græsbøll Christensen,et al.  Synthesis Lectures on Speech and Audio Processing , 2010 .

[17]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[18]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[19]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[20]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[21]  Melvyn J. Hunt,et al.  A discriminatively derived linear transform for improved speech recognition , 1993, EUROSPEECH.

[22]  Hermann Ney A dynamic programming technique for nonlinear smoothing , 1981, ICASSP.

[23]  Franz Pernkopf,et al.  A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[25]  Shlomo Dubnov,et al.  Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model , 2004, IEEE Transactions on Speech and Audio Processing.

[26]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[27]  Judith C. Brown Musical fundamental frequency tracking using a pattern recognition method , 1992 .

[28]  Hermann Ney,et al.  Dynamic programming algorithm for optimal estimation of speech parameter contours , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[29]  Andreas Jakobsson,et al.  Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[30]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[31]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[32]  Andreas Jakobsson,et al.  Joint High-Resolution Fundamental Frequency and Order Estimation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  L. Dolansky,et al.  On certain irregularities of voiced-speech waveforms , 1968 .

[34]  G. Turin,et al.  An introduction to matched filters , 1960, IRE Trans. Inf. Theory.

[35]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[37]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[38]  Patrick A. Naylor,et al.  EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[39]  P. Deb Finite Mixture Models , 2008 .

[40]  DeLiang Wang,et al.  HMM-Based Multipitch Tracking for Noisy and Reverberant Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Alex Acero,et al.  Maximum a posteriori pitch tracking , 1998, ICSLP.

[42]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .