A unified approach to speech enhancement and voice activity detection

In this paper, a unified system for voice activity detection (VAD) and speech enhancement is proposed. In the proposed system, there is mutual exchange of information between VAD and speech enhancement blocks. A new and robust VAD algorithm is implemented for the VAD block of the unified system. The newly proposed VAD algorithm uses a periodicity measure and an energy measure obtained from spectral energy distribution and spectral energy difference of the input speech data. For the speech enhancement block, the modified Wiener filtering (MWF) approach is utilized. It has been shown that the utilization of information exchange between the VAD and MWF algorithms in the unified system increases the performance of both algorithms and the proposed unified system improves the robustness of a speech recognition system significantly. Both of the enhanced algorithms are noniterative. Therefore, the proposed unified system is computationally attractive for real-time applications.

[1]  Harry Wechsler,et al.  Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  George Carayannis,et al.  Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..

[3]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[4]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to speech recognition , 1991, IEEE Trans. Signal Process..

[5]  Wonyong Sung,et al.  A voice activity detector employing soft decision based noise spectrum adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[7]  Levent M. Arslan Modified Wiener filtering , 2006, Signal Process..

[8]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[9]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[10]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[11]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[12]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[13]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[14]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[16]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[17]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[18]  Mustafa Yilmaz,et al.  Design and implementation of a voice-controlled prosthetic hand , 2011 .

[19]  D. Purves,et al.  The Statistical Structure of Human Speech Sounds Predicts Musical Universals , 2003, The Journal of Neuroscience.

[20]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[21]  John H. L. Hansen,et al.  Minimum cost based phoneme class detection for improved iterative speech enhancement , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  John H. L. Hansen,et al.  Markov model-based phoneme class partitioning for improved constrained iterative speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[23]  Tunga Güngör,et al.  A CORPUS-BASED CONCATENATIVE SPEECH SYNTHESIS SYSTEM FOR TURKISH , 2006 .

[24]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[25]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .