Speech enhancement and segregation based on the localisation cue for cocktail-party processing

This paper describes a method of using localisation information for separation of concurrent speech signals. In such a condition, although speech sounds overlap in time and frequency, their localisation is a specific cue which can be exploit. The study includes design and analysis of a double speech corpus of stereophonic recordings. We examine the statistical relation between the estimated TDOA in time/frequency regions, and the local relative level between the two sources (known a priori), varying the size of each time/frequency region. Using this observation, we propose a model of local estimation of the signal/noise ratio based on this cue, with the aim of reconstructing the components of the mixture by weighting the time/frequency domain.