Binaural Cues for Fragment-Based Speech Recognition in Reverberant Multisource Environments

This paper addresses the problem of speech recognition using distant binaural microphones in reverberant multisource noise conditions. Our scheme employs a two stage fragment decoding approach: first spectro-temporal acoustic source fragments are identified using signal level cues, and second, a hypothesisdriven stage simultaneously searches for the most probable speech/background fragment labelling and the corresponding acoustic model state sequence. The paper reports the first successful attempt to use binaural localisation cues within this framework. By integrating binaural cues and acoustic models in a consistent probabilistic framework, the decoder is able to derive significant recognition performance benefits from fragment location estimates despite their inherent unreliability.