Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds

A method for the separation of harmonic structures of cochannel input concurrent sounds is described. A model for multiple harmonic structures is constructed with a mixture of tied Gaussian mixtures, from which a single harmonic structure is modeled. Our algorithm enables estimation of both the number and the shape of the underlying harmonic structures, based on a maximum likelihood estimation of the model parameters using the EM algorithm and an information criterion. It operates without restriction on the number of mixed sounds and varieties of sound sources, and extracts accurate fundamental frequencies continuously with simple procedures in the spectral domain. Experiments showed high performance of the algorithm for both simultaneous speech and polyphonic music.

[1]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Mototsugu Abe,et al.  Multiple Pitch Tracking and Harmonic Segregation Algorithm for Auditory Scene Analysis , 1998 .

[4]  Masataka Goto A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Anssi Klapuri,et al.  Separation of harmonic sounds using linear models for the overtone series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Anssi Klapuri,et al.  Robust Multipitch Estimation for the Analysis and Manipulation of Polyphonic Musical Signals , 2000 .

[7]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[8]  Kunio Kashino,et al.  Organization of Hierarchical Perceptual Sounds: Music Scene Analysis with Autonomous Processing Modules and a Quantitative Information Integration Mechanism , 1995, IJCAI.

[9]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Mototsugu Abe,et al.  Auditory scene analysis based on time-frequency integration of shared FM and AM (II): Optimum time-domain integration and stream sound reconstruction , 2002, Systems and Computers in Japan.

[11]  David Malah,et al.  Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Simon J. Godsill,et al.  Bayesian harmonic models for musical pitch estimation and analysis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.