Cochannel speaker separation by harmonic enhancement and suppression

This paper presents a system for separating the cochannel speech of two talkers. The proposed harmonic enhancement and suppression (HES) system is based on a frame-by-frame speaker separation algorithm that exploits the pitch estimate of the stronger talker derived from the cochannel signal. The idea behind this approach is to recover the stronger talker's speech by enhancing their harmonic frequencies and formants given a multiresolution pitch estimate. The weaker talker's speech is obtained from the residual signal created when the harmonics and formants of the stronger talker are suppressed. An automatic speaker assignment algorithm is used to place recovered frames from the target and interfering talkers in separate channels. Automatic speaker assignment performs reasonably well in most cochannel environments, including voiced-on-voiced, voiced-on-unvoiced, unvoiced-on-unvoiced, assignment after processing silence intervals, and single talker speech (no cochannel interference). The HES system has been tested at target-to-interferer ratios (TIRs) from -18 to 18 dB with widely available data bases. It has demonstrated improved performance in keyword spotting tests for TIR values of 6, 12, and 18 dB, and in human listening tests for TIR values of -6 and -18 dB.

[1]  N I Durlach,et al.  Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. , 1985, Journal of speech and hearing research.

[2]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[3]  J. Naylor,et al.  An effective speech separation system which requires no a priori information , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  R J Stubbs,et al.  Evaluation of two voice-separation algorithms using normal-hearing and hearing-impaired listeners. , 1988, The Journal of the Acoustical Society of America.

[5]  Stephen E. Levinson,et al.  A vector quantizer incorporating both LPC shape and energy , 1984, ICASSP.

[6]  Ronald Howell Frazier An adaptive filtering approach toward speech enhancement. , 1975 .

[7]  Philippe Martin Comparison of pitch detection by cepstrum and spectral comb analysis , 1982, ICASSP.

[8]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[9]  Alan V. Oppenheim,et al.  Evaluation of an adaptive comb filtering method for enhancing speech degraded by white noise addition , 1978 .

[10]  M. A. Zissman,et al.  Two-Talker Pitch Tracking for Co-Channel Talker Interference Suppression , 1992 .

[11]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .

[12]  David Malah,et al.  Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Donald W. Tufts,et al.  Excitation-synchronous modeling of voiced speech , 1987, IEEE Trans. Acoust. Speech Signal Process..

[14]  Steven Kay,et al.  The effects of noise on the autoregressive spectral estimator , 1979 .

[15]  S. Boll,et al.  Techniques for suppression of an interfering talker in co-channel speech , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  S. J. Elliott,et al.  Adaptive cancellation of periodic, synchronously sampled interference , 1985, IEEE Trans. Acoust. Speech Signal Process..

[17]  Clifford J. Weinstein,et al.  Speech-state-adaptive simulation of co-channel talker interference suppression , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[18]  Clifford J. Weinstein,et al.  Automatic talker activity labeling for co-channel talker interference suppression , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[19]  D G Childers,et al.  Cochannel speech separation. , 1988, The Journal of the Acoustical Society of America.

[20]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  T. Parks,et al.  Maximum likelihood pitch estimation , 1976 .

[22]  Mark A. Clements,et al.  A Computationally Compact Divergence Measure for Speech Processing , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  R J Stubbs,et al.  Algorithms for separating the speech of interfering talkers: evaluations with voiced sentences, and normal-hearing and hearing-impaired listeners. , 1990, The Journal of the Acoustical Society of America.

[24]  Brian A. Hanson,et al.  The harmonic magnitude suppression (EMS) technique for intelligibility enhancement in the presence of interfering speech , 1984, ICASSP.

[25]  Thomas F. Quatieri,et al.  An approach to co-channel talker interference suppression using a sinusoidal model for speech , 1990, IEEE Trans. Acoust. Speech Signal Process..

[26]  R J Stubbs,et al.  Effects of signal-to-noise ratio, signal periodicity, and degree of hearing impairment on the performance of voice-separation algorithms. , 1991, The Journal of the Acoustical Society of America.