Classification of voiced and non-voiced speech signals using empirical wavelet transform and multi-level local patterns

This paper presents a novel algorithm for classification of voiced and non-voiced speech segments in noisy environment. Empirical wavelet transform (EWT), an adaptive technique for analyzing non-stationary signals, is employed in the pre-processing stage for suppression of noise in speech signals. In this work, multi-level local patterns (MLP), modified version of 1D-local binary patterns (LBP) are used as features. Multi-level local patterns capture the local variations in non-stationary signal by performing comparisons in neighborhood of a sample. Finally, the comparative information thus generated is encoded into multiple states and histogram of MLPs corresponding to short segments of speech signal is computed. Nearest neighbor classifier utilizes the histogram features for classification of speech segments. Experimental evaluation of proposed approach is carried out on the publicly available CMU-Arctic database. The results of our experiments show improvement in classification accuracy with the use of EWT. Further, the MLP based approach clearly yields superior performance than the LBP based approach.

[1]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[2]  Dhany Arifianto,et al.  Dual Parameters for Voiced-Unvoiced Speech Signal Determination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[4]  John J. Soraghan,et al.  1-D Local Binary Patterns for onset detection of myoelectric signals , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[5]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Pooja Jain,et al.  Event-Based Method for Instantaneous Fundamental Frequency Estimation from Voiced Speech Based on Eigenvalue Decomposition of the Hankel Matrix , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[8]  Ram Bilas Pachori,et al.  Classification of seizure and seizure-free EEG signals using local binary patterns , 2015, Biomed. Signal Process. Control..

[9]  Bayya Yegnanarayana,et al.  Voiced/Nonvoiced Detection Based on Robustness of Voiced Epochs , 2010, IEEE Signal Processing Letters.

[10]  John J. Soraghan,et al.  1-D Local binary patterns based VAD used INHMM-based improved speech recognition , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[11]  Keun-Sung Bae,et al.  Speech enhancement with reduction of noise components in the wavelet domain , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  G. Bachur,et al.  1 Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal , 2008 .

[13]  E. Jafer,et al.  Wavelet-based voiced/unvoiced classification algorithm , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).

[14]  Keikichi Hirose,et al.  Robust voiced/unvoiced speech classification using empirical mode decomposition and periodic correlation model , 2008, INTERSPEECH.

[15]  Peter Vary,et al.  A novel voiced / unvoiced / silence classification scheme for offline speech coding , 2002, 2002 11th European Signal Processing Conference.

[16]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[17]  David G. Stork,et al.  Pattern Classification , 1973 .

[18]  Bobby R. Hunt,et al.  Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier , 1993, IEEE Trans. Speech Audio Process..

[19]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[20]  Pooja Jain,et al.  Marginal energy density over the low frequency range as a feature for voiced/non-voiced detection in noisy speech signals , 2013, J. Frankl. Inst..

[21]  Jérôme Gilles,et al.  Empirical Wavelet Transform , 2013, IEEE Transactions on Signal Processing.

[22]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[23]  Douglas D. O'Shaughnessy,et al.  Voiced-Unvoiced-Silence Speech Sound Classification Based on Unsupervised Learning , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[24]  Ram Bilas Pachori,et al.  Classification of seizure and seizure-free EEG signals using multi-level local patterns , 2014, 2014 19th International Conference on Digital Signal Processing.

[25]  Hamid Sheikhzadeh,et al.  An improved wavelet-based speech enhancement system , 2001, INTERSPEECH.

[26]  Pooja Jain,et al.  GCI identification from voiced speech using the eigen value decomposition of Hankel matrix , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).