Robust Fundamental Frequency Estimation in Coloured Noise

Most parametric fundamental frequency estimators make the implicit assumption that any corrupting noise is additive, white Gaus-sian. Under this assumption, the maximum likelihood (ML) and the least squares estimators are the same, and statistically efficient. However, in the coloured noise case, the estimators differ, and the spectral shape of the corrupting noise should be taken into account. To allow for this, we here propose two schemes that refine the noise statistics and parameter estimates in an iterative manner, one of them based on an approximate ML solution and the other one based on removing the periodic signal obtained from a linearly constrained minimum variance (LCMV) filter. Evaluations on real speech data indicate that the iteration steps improve the estimation accuracy, therefore offering improvement over traditional non-parametric fundamental frequency methods in most of the evaluated scenarios.

[1]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[2]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[3]  Andreas Jakobsson,et al.  Cisoid parameter estimation in the colored noise case: asymptotic Cramer-Rao bound, maximum likelihood, and nonlinear least-squares , 1997, IEEE Trans. Signal Process..

[4]  Jacob Benesty,et al.  Enhancement of Single-Channel Periodic Signals in the Time-Domain , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Andreas Jakobsson,et al.  Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[6]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[7]  Emmanuel Vincent,et al.  Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Mads Græsbøll Christensen,et al.  Adaptive Pre-whitening Based on Parametric NMF , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[9]  Alfredo Esquivel Jaramillo,et al.  A Study on How Pre-whitening Influences Fundamental Frequency Estimation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[11]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[12]  Mads Græsbøll Christensen,et al.  Model-Based Noise PSD Estimation from Speech in Non-Stationary Noise , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Jian Li,et al.  Efficient mixed-spectrum estimation with applications to target feature extraction , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[14]  T. Söderström,et al.  On reparametrization of loss functions used in estimation and the invariance principle , 1989 .

[15]  Y. Selen,et al.  Model-order selection: a review of information criterion rules , 2004, IEEE Signal Processing Magazine.

[16]  Ali Taylan Cemgil,et al.  Bayesian Model Comparison With the g-Prior , 2014, IEEE Transactions on Signal Processing.

[17]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[18]  Emilia Gómez,et al.  Automatic Transcription of Flamenco Singing From Polyphonic Music Recordings , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[20]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[22]  Søren Holdt Jensen,et al.  Fast fundamental frequency estimation: Making a statistically efficient estimator computationally efficient , 2017, Signal Process..

[23]  Mads Græsbøll Christensen,et al.  On Optimal Filtering for Speech Decomposition , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[24]  Venkatesh Nagesha,et al.  Maximum likelihood estimation of signals in autoregressive noise , 1994, IEEE Trans. Signal Process..

[25]  Patrick A. Naylor,et al.  Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[27]  Max A. Little,et al.  Robust Bayesian Pitch Tracking Based on the Harmonic Model , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Abeer Alwan,et al.  Reducing F0 Frame Error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Petre Stoica,et al.  Spectral Analysis of Signals , 2009 .