A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues

In this paper we propose a complete computational system for Auditory Scene Analysis. This time-frequency system localizes, separates, and spatializes an arbitrary number of audio sources given only binaural signals. The localization is based on recent research frameworks, where interaural level and time differences are combined to derive a confident direction of arrival (azimuth) at each frequency bin. Here, the power-weighted histogram constructed in the azimuth space is modeled as a Gaussian Mixture Model, whose parameter structure is revealed through a weighted Expectation Maximization. Afterwards, a bank of Gaussian spatial filters is configured automatically to extract the sources with significant energy accordingly to a posterior probability. In this frequency-domain framework, we also inverse a geometrical and physical head model to derive an algorithm that simulates a source as originating from any azimuth angle.

[1]  Harald Viste,et al.  Binaural Source Localization , 2004 .

[2]  Harald Viste,et al.  Binaural localization and separation techniques , 2004 .

[3]  R. Woodworth,et al.  PSYCHIATRY AND EXPERIMENTAL PSYCHOLOGY , 1906 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[6]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[7]  C. Avendano,et al.  Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[8]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[9]  Hirokazu Kameoka,et al.  Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.