Maximum A Posteriori Probability Multiple-Pitch Tracking Using the Harmonic Model

In this paper, a new method for multiple fundamental frequency estimation for speech and music signals is proposed. Applications of audio and speech processing include many well-reviewed algorithms for estimating the fundamental frequency of monophonic speech and music signals. In the case of polyphonic signals, it is more difficult to successfully estimate each of the fundamental frequencies, as reflected by the dearth of existing methods addressing this problem. In this paper, a new method based on the combination of the maximum likelihood and maximum a posteriori probability criteria is derived for fundamental frequencies tracking where each one of the fundamental frequencies is modeled by a first-order Markov process. The dominant signal is modeled as a harmonic source with unknown deterministic amplitudes, while the remaining signals, including other harmonic signals, are modeled as Gaussian interference sources with an unknown covariance matrix. After estimation of the dominant source, it is removed from the signal by projection of the signal into the null subspace spanned by the estimated signal. This procedure is iterated for all the harmonic sources in the data. The algorithm is tested with speech, music, and synthetic signals where in each case, two harmonic sources of the same kind were mixed. The performance of the proposed algorithm is evaluated and compared to an existing reference method in terms of gross-error-rate as a function of signal-to-interference ratio.

[1]  Takao Kobayashi,et al.  Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Thomas Kailath,et al.  Detection of signals by information theoretic criteria , 1985, IEEE Trans. Acoust. Speech Signal Process..

[3]  Anssi Klapuri,et al.  Pitch estimation using multiple independent time-frequency windows , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[4]  Chin-Teng Lin,et al.  Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure , 2001, IEEE Trans. Speech Audio Process..

[5]  Andreas Jakobsson,et al.  The Multi-Pitch Estimation Problem: some New Solutions , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Lawrence K. Saul,et al.  Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization , 2004, NIPS.

[7]  M.G. Christensen,et al.  Multi-Pitch Estimation Using Harmonic Music , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[8]  Jr. S. Marple,et al.  Computing the discrete-time 'analytic' signal via FFT , 1999, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[9]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[10]  Chris Chafe,et al.  Source separation and note identification in polyphonic music , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Shlomo Dubnov,et al.  Generalized Likelihood Ratio Test for Voiced-Unvoiced Decision in Noisy Speech Using the Harmonic Model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[14]  Michael I. Jordan,et al.  Discriminative training of hidden Markov models for multiple pitch tracking [speech processing examples] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  R. Meddis,et al.  A unitary model of pitch perception. , 1997, The Journal of the Acoustical Society of America.

[16]  Thomas F. Quatieri,et al.  High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[18]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[19]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[20]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[21]  J. Beauchamp,et al.  Fundamental frequency estimation of musical signals using a two‐way mismatch procedure , 1994 .

[22]  Arshia Cont Realtime Multiple Pitch Observation using Sparse Non-negative Constraints , 2006, ISMIR.

[23]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[24]  A.P. Klapuri,et al.  A perceptually motivated multiple-F0 estimation method , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[25]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[26]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  DeLiang Wang,et al.  Model-based sequential organization in cochannel speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Andreas Jakobsson,et al.  Joint High-Resolution Fundamental Frequency and Order Estimation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[30]  Shlomo Dubnov,et al.  Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model , 2004, IEEE Transactions on Speech and Audio Processing.

[31]  Corentin Dubois,et al.  Joint Detection and Tracking of Time-Varying Harmonic Components: A Flexible Bayesian Approach , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Keith D. Martin,et al.  Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing , 1999 .

[33]  Simon J. Godsill,et al.  Bayesian harmonic models for musical pitch estimation and analysis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[35]  Fabrice Plante,et al.  A pitch extraction reference database , 1995, EUROSPEECH.

[36]  Harald Haas,et al.  Asilomar Conference on Signals, Systems, and Computers , 2006 .

[37]  A. de Cheveigné Multiple F0 estimation , 2006 .

[38]  Jeffrey L. Krolik,et al.  Relationships between adaptive minimum variance beamforming and optimal source localization , 2000, IEEE Trans. Signal Process..

[39]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[40]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[41]  Michael I. Jordan Graphical Models , 2003 .