Speech Segregation Using an Event-synchronous Auditory Image and STRAIGHT

We have presented methods to segregate concurrent speech sounds using an auditory model and a vocoder. Specifically, the method involves the Auditory Image Model (AIM), a robust F0 estimator, and a synthesis module based either on STRAIGHT or an auditory synthesis filterbank. The event-synchronous procedure enhances the intelligibility of the target speaker in the presence of concurrent background speech. The resulting segregation performance is better than with conventional comb-filter methods whenever there are errors in fundamental frequency estimation as there always are in real concurrent speech. Test results suggest that this auditory segregation method has potential for speech enhancement in applications such as hearing aids.