Single and Multiple F0 Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments

This paper proposes a novel F0 contour estimation algorithm based on a precise parametric description of the voiced parts of speech derived from the power spectrum. The algorithm is able to perform in a wide variety of noisy environments as well as to estimate the F0s of cochannel concurrent speech. The speech spectrum is modeled as a sequence of spectral clusters governed by a common F0 contour expressed as a spline curve. These clusters are obtained by an unsupervised 2-D time-frequency clustering of the power density using a new formulation of the EM algorithm, and their common F 0 contour is estimated at the same time. A smooth F0 contour is extracted for the whole utterance, linking together its voiced parts. A noise model is used to cope with nonharmonic background noise, which would otherwise interfere with the clustering of the harmonic portions of speech. We evaluate our algorithm in comparison with existing methods on several tasks, and show 1) that it is competitive on clean single-speaker speech, 2) that it outperforms existing methods in the presence of noise, and 3) that it outperforms existing methods for the estimation of multiple F0 contours of cochannel concurrent speech

[1]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[2]  Roy D. Patterson,et al.  Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[3]  Dieter Huber,et al.  Pitch period determination of aperiodic speech signals , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[5]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[6]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[7]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[9]  Boris Doval Estimation de la frequence fondamentale des signaux sonores , 1994 .

[10]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[11]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[12]  Hirokazu Kameoka,et al.  Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Hirokazu Kameoka,et al.  Speech analyzer using a joint estimation model of spectral envelope and fine structure , 2006, INTERSPEECH.

[14]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[15]  Y. H. Gu,et al.  Co-channel speech separation using frequency bin non-linear adaptive filtering , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[17]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[18]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[20]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[21]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .