Voice Activity Detection Algorithm Using Spectral-Correlation and Wavelet-Packet Transformation

It is developed the voice activity detection algorithm using noise classification technique. It is proposed the spectral-correlation and wavelet-packet (WP) features of frames for voice activity estimation. There are tested three WP trees for effective representing of audio segments: mel-scaled wavelet packet tree, bark-scaled wavelet packet tree and ERB-scaled (equivalent rectangular bandwidth) wavelet packet tree. Application only two principal components of WP features allows to classify accurately the environment noise. The using wavelet-packet tree design which follows the concept of equivalent rectangular bandwidth for acoustic feature extraction allows to increase the voice/silence segments classification accuracy by at least 4% in compare to other classification based voice activity detection algorithms for different noise.

[1]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  James R. Glass,et al.  Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency , 2011, INTERSPEECH.

[4]  Björn W. Schuller,et al.  Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[6]  Prasanna Kumar Sahu,et al.  Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition , 2014 .

[7]  S. Madhu,et al.  Performance analysis of thresholding techniques for denoising of simulated partial discharge signals corrupted by Gaussian white noise , 2015, 2015 International Conference on Power and Advanced Control Engineering (ICPACE).

[8]  Yan Zhang,et al.  A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement , 2014, TheScientificWorldJournal.

[9]  Themos Stafylakis,et al.  Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus , 2014, Odyssey.

[10]  Mariusz Ziólko,et al.  Wavelet method of speech segmentation , 2006, 2006 14th European Signal Processing Conference.

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Juan Manuel Górriz,et al.  Speech/non-speech discrimination combining advanced feature extraction and SVM learning , 2006, INTERSPEECH.

[13]  J. Mohanalin,et al.  Voice Activity Detection Using Fuzzy Entropy and Support Vector Machine , 2016, Entropy.

[14]  Javier Ramírez,et al.  An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[15]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[16]  Mohsen Rahmani,et al.  A wavelet based speech enhancement method using noise classification and shaping , 2008, INTERSPEECH.

[17]  Raghunath S. Holambe,et al.  Speaker Identification Using Admissible Wavelet Packet Based Decomposition , 2010 .

[18]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Hong Liu,et al.  Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors , 2014, 2014 19th International Conference on Digital Signal Processing.

[20]  Chai Wutiwiwatchai,et al.  Robust Speech Recognition Using PCA-Based Noise Classification , 1970 .

[21]  Myoung Nam Kim,et al.  Voice activity detection algorithm using perceptual wavelet entropy neighbor slope. , 2014, Bio-medical materials and engineering.

[22]  Gerhard Schmidt,et al.  Features for voice activity detection: a comparative analysis , 2015, EURASIP J. Adv. Signal Process..

[23]  Trieu-Kien Truong,et al.  Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator , 2007, Pattern Recognit. Lett..

[24]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[25]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[26]  Arnaud Martin,et al.  Towards improving speech detection robustness for speech recognition in adverse conditions , 2003, Speech Commun..

[27]  Jaeseok Kim,et al.  Vowel based Voice Activity Detection with LSTM Recurrent Neural Network , 2016, ICSPS 2016.

[28]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[29]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..