论文信息 - Voice Activity Detection Algorithm Using Spectral-Correlation and Wavelet-Packet Transformation

Voice Activity Detection Algorithm Using Spectral-Correlation and Wavelet-Packet Transformation

It is developed the voice activity detection algorithm using noise classification technique. It is proposed the spectral-correlation and wavelet-packet (WP) features of frames for voice activity estimation. There are tested three WP trees for effective representing of audio segments: mel-scaled wavelet packet tree, bark-scaled wavelet packet tree and ERB-scaled (equivalent rectangular bandwidth) wavelet packet tree. Application only two principal components of WP features allows to classify accurately the environment noise. The using wavelet-packet tree design which follows the concept of equivalent rectangular bandwidth for acoustic feature extraction allows to increase the voice/silence segments classification accuracy by at least 4% in compare to other classification based voice activity detection algorithms for different noise.

O. Korniienko | E. A. Machusky | O. Korniienko | E. Machusky

[1] Lawrence R. Rabiner,et al. A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[2] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[3] James R. Glass,et al. Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency , 2011, INTERSPEECH.

[4] Björn W. Schuller,et al. Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[6] Prasanna Kumar Sahu,et al. Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition , 2014 .

[7] S. Madhu,et al. Performance analysis of thresholding techniques for denoising of simulated partial discharge signals corrupted by Gaussian white noise , 2015, 2015 International Conference on Power and Advanced Control Engineering (ICPACE).

[8] Yan Zhang,et al. A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement , 2014, TheScientificWorldJournal.

[9] Themos Stafylakis,et al. Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus , 2014, Odyssey.

[10] Mariusz Ziólko,et al. Wavelet method of speech segmentation , 2006, 2006 14th European Signal Processing Conference.

[11] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12] Juan Manuel Górriz,et al. Speech/non-speech discrimination combining advanced feature extraction and SVM learning , 2006, INTERSPEECH.

[13] J. Mohanalin,et al. Voice Activity Detection Using Fuzzy Entropy and Support Vector Machine , 2016, Entropy.

[14] Javier Ramírez,et al. An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[15] P. Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[16] Mohsen Rahmani,et al. A wavelet based speech enhancement method using noise classification and shaping , 2008, INTERSPEECH.

[17] Raghunath S. Holambe,et al. Speaker Identification Using Admissible Wavelet Packet Based Decomposition , 2010 .

[18] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Hong Liu,et al. Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors , 2014, 2014 19th International Conference on Digital Signal Processing.

[20] Chai Wutiwiwatchai,et al. Robust Speech Recognition Using PCA-Based Noise Classification , 1970 .

[21] Myoung Nam Kim,et al. Voice activity detection algorithm using perceptual wavelet entropy neighbor slope. , 2014, Bio-medical materials and engineering.

[22] Gerhard Schmidt,et al. Features for voice activity detection: a comparative analysis , 2015, EURASIP J. Adv. Signal Process..

[23] Trieu-Kien Truong,et al. Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator , 2007, Pattern Recognit. Lett..

[24] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[25] Herman J. M. Steeneken,et al. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[26] Arnaud Martin,et al. Towards improving speech detection robustness for speech recognition in adverse conditions , 2003, Speech Commun..

[27] Jaeseok Kim,et al. Vowel based Voice Activity Detection with LSTM Recurrent Neural Network , 2016, ICSPS 2016.

[28] Haizhou Li,et al. An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[29] E. Shlomot,et al. ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..