Computational scene analysis of cocktail-party situations based on sequential Monte Carlo methods

A frequent demand for noise suppression in digital hearing aids is speech enhancement in noisy multi-talker conditions. Whereas multi-microphone array-processing techniques employing a stationary or slowly varying directivity yield an improvement in intelligibility, binaural noise suppression algorithms using the two signals recorded at the left and right ears have not yet been shown to yield significant benefit in complex acoustical environments. We therefore explore the approach to integrate principles of auditory scene analysis in speech enhancement algorithms. From psychoacoustics it is known that common onsets, common amplitude modulation and sound source direction are among the important cues used for source separation by the human auditory system. However it is largely unknown how the 'binding' of different cues may work. A possible approach to tackle the binding problem is proposed in this paper. A new algorithm is presented, which performs statistical estimation of different sources by a state-space approach, which integrates temporal and frequency-specific features of speech. It is based on a sequential Monte Carlo (SMC) scheme and tracks magnitude spectra and direction on a frame-by-frame basis using binaural signals. This is achieved by integrating empirically measured high-dimensional statistics of speech and directional information from head-related transfer functions. Results for estimating sound source direction of a moving voice and spectral envelopes of two voices are shown. The results indicate that the algorithm is able to localize two superimposed sound sources and separate their spectral envelope on-line with adaption times of about 50 ms, which is much faster than typical blind source separation algorithms.

[1]  W. Yost Auditory image perception and analysis: The basis for hearing , 1991, Hearing Research.

[2]  John MacCormick The Condensation algorithm , 2002 .

[3]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[4]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[5]  S. McAdams,et al.  Auditory Cognition. (Book Reviews: Thinking in Sound. The Cognitive Psychology of Human Audition.) , 1993 .

[6]  Thomas Wittkop,et al.  Two-channel noise reduction algorithms motivated by models of binaural interaction , 2001 .

[7]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[8]  Ch. von der Malsburg,et al.  A neural cocktail-party processor , 1986, Biological Cybernetics.

[9]  Stephen Handel Thinking in Sound: The Cognitive Psychology of Human Audition Stephen McAdams Emmanuel Bigand , 1995 .

[10]  Volker Hohmann,et al.  Strategy-selective noise reduction for binaural digital hearing aids , 2003, Speech Commun..

[11]  Albert S. Bregman,et al.  Auditory scene analysis : hearing in complex environments , 1993 .

[12]  B. Kopp Hierarchical classification III: Average-linkage, median, centroid, WARD, flexible strategy: Average-linkage, Median, Centroid, WARD , 1978 .

[13]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[14]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[15]  J Verschuure,et al.  Directional hearing aid based on array technology. , 1995, Scandinavian audiology. Supplementum.