Multi-pitch Detection Algorithm Using Constrained Gaussian Mixture Model and Information Criterion for Simultaneous Speech

In this paper, a co-channel multi-pitch detection algorithm is described. We suggest the importance of this when prosodic information is need to be extracted separately from respective F0 patterns of concurrent utterances. Though temporal continuity of speech prosody should be considered, we discuss a process done independently on each single frame as the first step. A model of multiple harmonic structures is constructed with a mixture of tied Gaussian mixtures with which a single harmonic structure is modeled. Our algorithm enables to detect both a number of concurrent speakers, and each spectral envelope of underlying harmonic structure based on a maximum likelihood estimation of the model parameters using EM algorithm and an information criterion. It operates without a priori information of F0 contours and a restriction of a number of speakers, and it also extracts accurate F0s as continuous values with simple procedures in spectral domain. Experiments showed our algorithm outperformed well-known cepstrum for both speech signals of a single speaker and simultaneous two speakers.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Anssi Klapuri,et al.  Robust Multipitch Estimation for the Analysis and Manipulation of Polyphonic Musical Signals , 2000 .

[3]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[4]  Mototsugu Abe,et al.  Auditory scene analysis based on time-frequency integration of shared FM and AM (II): Optimum time-domain integration and stream sound reconstruction , 2002, Systems and Computers in Japan.

[5]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Matti Karjalainen,et al.  Multi-pitch and periodicity analysis model for sound separation and auditory scene analysis , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Anssi Klapuri,et al.  Separation of harmonic sounds using linear models for the overtone series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  David Malah,et al.  Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Simon J. Godsill,et al.  Bayesian harmonic models for musical pitch estimation and analysis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.