A voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin

Voice activity detection algorithms are widely used in the areas of voice compression, speech synthesis, speech recognition, speech enhancement, and etc. In this paper, an efficient voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin is proposed. The proposed sub-band detection consists of two parts: crosswise detection and lengthwise detection. Energy detection and pitch detection are in the range of considerations. For a better performance, double-threshold criterion is used to reduce the misjudgment rate of the detection. Performance evaluation is based on six noise environments with different SNRs. Experiment results indicate that the proposed algorithm can detect the area of voice effectively in non-stationary environment and low SNR environment and has the potential to progress.

[1]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[2]  K.-C. Wang,et al.  Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  Chih-Wei Liu,et al.  Low-power ANSI S1.11 filter bank for digital hearing aids , 2008, 2008 International Conference on Signals and Electronic Systems.

[4]  R. Venkatesha Prasad,et al.  Comparison of voice activity detection algorithms for VoIP , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[5]  Hsiao-Chuan Wang,et al.  On the use of weighted filter bank analysis for the derivation of robust MFCCs , 2001, IEEE Signal Processing Letters.

[6]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[7]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[8]  David F. Rosenthal,et al.  Computational auditory scene analysis , 1998 .

[9]  Hermann Ney An optimization algorithm for determining the endpoints of isolated utterances , 1981, ICASSP.

[10]  Chih-Wei Liu,et al.  Design and Implementation of Low-Power ANSI S1.11 Filter Bank for Digital Hearing Aids , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[11]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[12]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[13]  Hsiao-Chuan Wang,et al.  On the use of weighted filter bank analysis for the derivation of robust MFCCs , 2001, IEEE Signal Process. Lett..

[14]  M. Kos Noise Reduction Algorithm for Robust Speech Recognition Using Minimum Statistics Method and Neural Network VAD , 2007, 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services.

[15]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[16]  Mohammad Hasan Savoji,et al.  A robust algorithm for accurate endpointing of speech signals , 1989, Speech Commun..

[17]  Jon Barker,et al.  A pitch based noise estimation technique for robust speech recognition with Missing Data , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).