Speech enhancement and segregation based on the localisation cue for cocktail-party processing
暂无分享,去创建一个
This paper describes a method of using localisation information for separation of concurrent speech signals. In such a condition, although speech sounds overlap in time and frequency, their localisation is a specific cue which can be exploit. The study includes design and analysis of a double speech corpus of stereophonic recordings. We examine the statistical relation between the estimated TDOA in time/frequency regions, and the local relative level between the two sources (known a priori), varying the size of each time/frequency region. Using this observation, we propose a model of local estimation of the signal/noise ratio based on this cue, with the aim of reconstructing the components of the mixture by weighting the time/frequency domain.
[1] Hervé Glotin,et al. RECOGNITION: A DYNAMIC RECURRENT NETWORK , 2022 .
[2] Hervé Glotin,et al. A CASA front-end using the localisation cue for segregation and then cocktail-party speech recognition , 1999 .
[3] M. Bodden. Modeling human sound-source localization and the cocktail-party-effect , 1993 .
[4] K. H. Lehn,et al. Modeling binaural auditory scene analysis by a temporal fuzzy cluster analysis approach , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.