Multi-pitch trajectory estimation of concurrent speech based on harmonic GMM and nonlinear kalman filtering

Abstract This paper describes a multi-pitch tracking algorithmof 1-channel simultaneous multiple speech. The algo-rithm selectively carries out the two alternative processesat each frame: frame-independent-process and frame-dependent-process. The former is the one we have previ-ously proposed[6], that gives good estimates of the num-ber of speakers and F 0 s with a single-frame-processing.The latter corresponds to the topic mainly described inthis paper, that recursively tracks F 0 s using nonlinearKalman filtering. We tested our algorithm on simulta-neous speech signal data and showed higher performancethan when the frame-independent-process was only used. 1. Introduction 1-channel multi-pitch estimation technique may con-tribute to various applications, such as spontaneous dia-logue speech recognition, that allows competitive speech,noise robust speech recognition, especially where noiseis a harmonic signal(e.g., telephone ring, back groundmusic, etc.), and also many music applications. How-ever, multi-pitch estimation of non-stationary signalsis hardly simple due to the complex factors such asspectral overlap, poor frequency resolution and spec-tral widening in short-time analysis, etc. Various ap-proaches concerning to this problem have convention-ally been attempted [1, 2, 3], while two important taskshave been left unsolved. Firstly, there has been no ro-bust way of estimating the number of speakers, andmost of the methods were obliged to assume for sim-plicity that the number is known a priori. Secondly, thedouble/half(harmonics/subharmonics) pitch error has stillbeen one of the most critical problem where convinc-ing solutions are not yet proposed. One may say bothproblems share the same difficulty of defining physicallyor mathematically proper criteria. Until now, we haveproposed a GMM(Gaussian mixture model)-based multi-pitch estimation algorithm that works as a single-frame-processing and gives solutions to the two tasks statedabove according to the information criterion[6]. This al-gorithm has not yet taken into account any time depen-dency property, that would ensure the improvements in

[1]  A. Jazwinski Stochastic Processes and Filtering Theory , 1970 .

[2]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[3]  K. Nishi,et al.  Optimum harmonics tracking filter for auditory scene analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Hirokazu Kameoka,et al.  Multi-pitch Detection Algorithm Using Constrained Gaussian Mixture Model and Information Criterion for Simultaneous Speech , 2004 .

[5]  David Malah,et al.  Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Simon J. Godsill,et al.  Bayesian harmonic models for musical pitch estimation and analysis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.