Space-time voice activity detection

When speech-based interfaces are used for small handheld devices such as cellular phones and personal digital assistants in mobile environments with unknown noises and surrounding talkers, all signals except the legitimate users voice must be rejected as noise signals by the system. This paper proposes a new algorithm that detects the users voice in spatial and temporal domains using directional and spectral information. It rejects undesirable signals that originate from noise sources or surrounding talkers. Experimental results indicate the proposed algorithm reduces the voice activity detection error rate by 34.3% relative to the conventional methods.

[1]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[2]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  K.-C. Wang,et al.  Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments , 2005, IEEE Transactions on Speech and Audio Processing.

[4]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[5]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[6]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[7]  Arnaud Martin,et al.  Robust speech/non-speech detection based on LDA-derived parameter and voicing parameter for speech recognition in noisy environments , 2006, Speech Commun..

[8]  Nam Ik Cho,et al.  Voice activity detection using the phase vector in microphone array , 2007, INTERSPEECH.

[9]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[10]  Wei Zhang,et al.  A soft voice activity detector based on a Laplacian-Gaussian model , 2003, IEEE Trans. Speech Audio Process..

[11]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[12]  Zhao Li,et al.  GSC-based spatial voice activity detection for enhanced speech coding in the presence of competing speech , 2001, IEEE Trans. Speech Audio Process..

[13]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .