论文信息 - PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise

PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise

We present PEFAC, a fundamental frequency estimation algorithm for speech that is able to identify voiced frames and estimate pitch reliably even at negative signal-to-noise ratios. The algorithm combines a normalization stage, to remove channel dependency and to attenuate strong noise components, with a harmonic summing filter applied in the log-frequency power spectral domain, the impulse response of which is chosen to sum the energy of the fundamental frequency harmonics while attenuating smoothly-varying noise components. Temporal continuity constraints are applied to the selected pitch candidates and a voiced speech probability is computed from the likelihood ratio of two classifiers, one for voiced speech and one for unvoiced speech/silence. We compare the performance of our algorithm with that of other widely used algorithms and demonstrate that it performs well in both high and low levels of additive noise.

Mike Brookes | Sira Gonzalez | M. Brookes | S. Gonzalez

[1] Patrick A. Naylor,et al. Evaluation of pitch estimation in noisy speech for application in non-intrusive speech quality assessment , 2009, 2009 17th European Signal Processing Conference.

[2] Hirokazu Kameoka,et al. Single and Multiple F0 Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments , 2007, IEEE Trans. Speech Audio Process..

[3] Biing-Hwang Juang,et al. Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[4] Jont B. Allen,et al. Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .

[5] Guy J. Brown,et al. A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Mike Brookes,et al. A Pitch Estimation Filter robust to high levels of noise (PEFAC) , 2011, 2011 19th European Signal Processing Conference.

[7] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8] Philippe Martin. Comparison of pitch detection by cepstrum and spectral comb analysis , 1982, ICASSP.

[9] H. Dillon,et al. An international comparison of long‐term average speech spectra , 1994 .

[10] M. Schroeder. Period histogram and product spectrum: new methods for fundamental-frequency measurement. , 1968, The Journal of the Acoustical Society of America.

[11] T. W. Parsons. Separation of speech from interfering speech by means of harmonic selection , 1976 .

[12] Jean Rouat,et al. A pitch determination and voiced/unvoiced decision algorithm for noisy speech , 1995, Speech Commun..

[13] Eyal Yair,et al. Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[14] S. Katagiri,et al. Discriminative Learning for Minimum Error Classification , 2009 .

[15] Geoffrey J. McLachlan,et al. Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[16] Mads Græsbøll Christensen,et al. Synthesis Lectures on Speech and Audio Processing , 2010 .

[17] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.

[18] Anssi Klapuri,et al. Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[19] Alvin F. Martin,et al. The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[20] D. J. Hermes,et al. Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[21] Melvyn J. Hunt,et al. A discriminatively derived linear transform for improved speech recognition , 1993, EUROSPEECH.

[22] Hermann Ney. A dynamic programming technique for nonlinear smoothing , 1981, ICASSP.

[23] Franz Pernkopf,et al. A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[25] Shlomo Dubnov,et al. Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model , 2004, IEEE Transactions on Speech and Audio Processing.

[26] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .

[27] Judith C. Brown. Musical fundamental frequency tracking using a pattern recognition method , 1992 .

[28] Hermann Ney,et al. Dynamic programming algorithm for optimal estimation of speech parameter contours , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[29] Andreas Jakobsson,et al. Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[30] P. Boersma. Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[31] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[32] Andreas Jakobsson,et al. Joint High-Resolution Fundamental Frequency and Order Estimation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[33] L. Dolansky,et al. On certain irregularities of voiced-speech waveforms , 1968 .

[34] G. Turin,et al. An introduction to matched filters , 1960, IRE Trans. Inf. Theory.

[35] Anssi Klapuri,et al. Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[36] S. Furui,et al. Cepstral analysis technique for automatic speaker verification , 1981 .

[37] Masataka Goto,et al. A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[38] Patrick A. Naylor,et al. EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[39] P. Deb. Finite Mixture Models , 2008 .

[40] DeLiang Wang,et al. HMM-Based Multipitch Tracking for Noisy and Reverberant Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41] Alex Acero,et al. Maximum a posteriori pitch tracking , 1998, ICSLP.

[42] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .