Abstract This paper describes a multi-pitch tracking algorithmof 1-channel simultaneous multiple speech. The algo-rithm selectively carries out the two alternative processesat each frame: frame-independent-process and frame-dependent-process. The former is the one we have previ-ously proposed[6], that gives good estimates of the num-ber of speakers and F 0 s with a single-frame-processing.The latter corresponds to the topic mainly described inthis paper, that recursively tracks F 0 s using nonlinearKalman filtering. We tested our algorithm on simulta-neous speech signal data and showed higher performancethan when the frame-independent-process was only used. 1. Introduction 1-channel multi-pitch estimation technique may con-tribute to various applications, such as spontaneous dia-logue speech recognition, that allows competitive speech,noise robust speech recognition, especially where noiseis a harmonic signal(e.g., telephone ring, back groundmusic, etc.), and also many music applications. How-ever, multi-pitch estimation of non-stationary signalsis hardly simple due to the complex factors such asspectral overlap, poor frequency resolution and spec-tral widening in short-time analysis, etc. Various ap-proaches concerning to this problem have convention-ally been attempted [1, 2, 3], while two important taskshave been left unsolved. Firstly, there has been no ro-bust way of estimating the number of speakers, andmost of the methods were obliged to assume for sim-plicity that the number is known a priori. Secondly, thedouble/half(harmonics/subharmonics) pitch error has stillbeen one of the most critical problem where convinc-ing solutions are not yet proposed. One may say bothproblems share the same difficulty of defining physicallyor mathematically proper criteria. Until now, we haveproposed a GMM(Gaussian mixture model)-based multi-pitch estimation algorithm that works as a single-frame-processing and gives solutions to the two tasks statedabove according to the information criterion[6]. This al-gorithm has not yet taken into account any time depen-dency property, that would ensure the improvements in
[1]
A. Jazwinski.
Stochastic Processes and Filtering Theory
,
1970
.
[2]
H. Akaike,et al.
Information Theory and an Extension of the Maximum Likelihood Principle
,
1973
.
[3]
K. Nishi,et al.
Optimum harmonics tracking filter for auditory scene analysis
,
1996,
1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[4]
Hirokazu Kameoka,et al.
Multi-pitch Detection Algorithm Using Constrained Gaussian Mixture Model and Information Criterion for Simultaneous Speech
,
2004
.
[5]
David Malah,et al.
Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation
,
1993,
1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[6]
Simon J. Godsill,et al.
Bayesian harmonic models for musical pitch estimation and analysis
,
2002,
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.