A "usable speech" extraction system was proposed (Yanatorno, 1998) to separate co-channel speech into "usable" frames that are minimally corrupted by interfering speech. Studies indicate that a significant amount of cochannel speech can be considered "usable" for speaker identification (SID). Therefore, it is necessary to establish criteria for usable speech frames for SID. Voiced speech, of which usable speech is entirely comprised, is shown to be information rich for SID. In addition, SID accuracy increases as the frame-based target to interferer ratio (TIR) increases when evaluated independently of the amount of available segments. Krishnamachari et al. (2000) developed a frame-based spectral autocorrelation ratio (SAR) technique for determining usable frames within co-channel speech. The ability of the SAR method to determine usable frames at various thresholds is examined. This paper investigates the effectiveness of a frame-based usable speech extraction technique for speaker identification.
[1]
George Zavaliagkos,et al.
Sub-sentence discourse models for conversational speech recognition
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[2]
Robert E. Yantorno.
Co-Channel Speech and Speaker Identification Study
,
1998
.
[3]
Steven M. Kay,et al.
Cochannel speaker separation by harmonic enhancement and suppression
,
1997,
IEEE Trans. Speech Audio Process..
[4]
Stanley J. Wenndt,et al.
Spectral autocorrelation ratio as a usability measure of speech segments under co-channel conditions
,
2000
.