Co-Channel Speech and Speaker Identification Study

Abstract : This study was comprised of two parts. The first was to determine the effectiveness of speaker identification under two different speaker identification degradation conditions, additive noise and speaker interference, using the LPC cepstral coefficient approach. The second part was to develop a method for determination of co-channel speech, i.e., speaker count, and to develop an effective method of either speech extraction or speech suppression to enhance the operation of speaker identification under co-channel conditions. The results of the first part of study indicate that under conditions of the same amount of either noise or corrupting speech, for example 0 dB SNR or TIR (target-to-interference ratio), noise is much more detrimental than corrupting speech to the operation of the speaker identification. For example, with 100% of 0 dB corrupting speech there still occurs a certain number of correct speaker identifications, i.e., about 40% accuracy. Ten (10) dB TIR interfering speech, as well as small amounts of interfering speech, i. e., 40% 0 dB TIR are not as detrimental to speaker identification. The results of the second part of the study indicate that a system for speaker count and speaker separation is possible. The harmonic sampling approach, developed during the study, uses the periodic structure of the fine structure of the frequency characteristics of voiced speech. Successful reconstruction of a single speaker indicates the potential of this approach as a candidate for speech separation. Also, it was shown that detection of co-channel speech is possible using the harmonic sampling approach. Further improvements as well as other possible approaches to the co-channel speech problem are discussed.

[1]  Yunxin Zhao,et al.  Co-channel speech separation for robust automatic speech recognition: stability and efficiency , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  T. Parks,et al.  Maximum likelihood pitch estimation , 1976 .

[3]  M. Savic,et al.  Co-channel speaker separation based on maximum-likelihood deconvolution , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Xavier Rodet,et al.  Fundamental frequency estimation and tracking using maximum likelihood harmonic matching and HMMs , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  John S. D. Mason,et al.  On the limitations of cepstral features in noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Daniel S. Benincasa,et al.  Co-channel speaker separation using constrained nonlinear optimization , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  F.J. Casajus Quiros,et al.  Real-time, loose-harmonic matching fundamental frequency estimation for musical signals , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  S. Hamid Nawab,et al.  Improved musical pitch tracking using principal decomposition analysis , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Herbert Gish,et al.  Identification of speakers engaged in dialog , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Sarbani Palit,et al.  Extraction of multiple periodic waveforms from noisy data , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Fabrice Plante,et al.  Segregation of concurrent speech with the reassigned spectrum , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Meir Feder,et al.  Multi-channel signal separation by decorrelation , 1993, IEEE Trans. Speech Audio Process..

[13]  Zhi Ding,et al.  A matrix-pencil approach to blind separation of non-white sources in white noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Xavier Rodet,et al.  Estimation of fundamental frequency of musical sound signals , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[15]  E. Bryan George,et al.  Co-channel speaker separation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.