Instantaneous voiced/non-voiced detection in speech signals based on variational mode decomposition

Abstract In this paper, a variational mode decomposition (VMD) based method has been proposed for the instantaneous detection of voiced/non-voiced (V/NV) regions in the speech signals. In the proposed method, the VMD is applied in iterative way with specific input parameters. Firstly, the VMD decomposes the speech signal into two components, then, the VMD is applied successively on one of these two components based on suitably defined convergence criteria. It has been shown that the VMD applied in iterative way behaves as a low-pass filter and after convergence it provides separation of the fundamental frequency (F0) component from the speech signal. The envelope of the F0 component of the speech signal has been obtained using an analytical model based on single degree of freedom (SDOF). Automatic threshold has been computed from the obtained envelope in order to detect the V/NV regions in speech signals. The proposed method has been studied on speech signals and the corresponding electroglottograph (EGG) signals from the CMU-Arctic database in different noise conditions obtained from the NOISEX-92 database. Experimental results at various signal to noise ratios (SNRs) are included in order to show the effectiveness of the proposed method compared to the other existing methods for V/NV detection in speech signals.

[1]  M. Hestenes Multiplier and gradient methods , 1969 .

[2]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[3]  Juan Carlos De Martin,et al.  An adaptive multi-rate speech coder for digital cellular telephony , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Dhany Arifianto,et al.  Dual Parameters for Voiced-Unvoiced Speech Signal Determination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Bayya Yegnanarayana,et al.  Characterization of Glottal Activity From Speech Signals , 2009, IEEE Signal Processing Letters.

[6]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[7]  S. Hahn Hilbert Transforms in Signal Processing , 1996 .

[8]  Bobby R. Hunt,et al.  Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier , 1993, IEEE Trans. Speech Audio Process..

[9]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[10]  Buket D. Barkana,et al.  Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy , 2008, SCSS.

[11]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[12]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Dominique Zosso,et al.  Variational Mode Decomposition , 2014, IEEE Transactions on Signal Processing.

[14]  Peter Jancovic,et al.  Voicing-Character Estimation of Speech Spectra: Application to Noise Robust Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Hugo Leonardo Rufiner,et al.  A new algorithm for instantaneous F0 speech extraction based on Ensemble Empirical Mode Decomposition , 2009, 2009 17th European Signal Processing Conference.

[16]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[17]  R. Tyrrell Rockafellar,et al.  A dual approach to solving nonlinear programming problems by unconstrained optimization , 1973, Math. Program..

[18]  Pooja Jain,et al.  GCI identification from voiced speech using the eigen value decomposition of Hankel matrix , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[19]  K. Hirose,et al.  Voiced/Unvoiced Detection of Speech Signals Using Empirical Mode Decomposition Model , 2007, 2007 International Conference on Information and Communication Technology.

[20]  Shlomo Dubnov,et al.  Generalized Likelihood Ratio Test for Voiced-Unvoiced Decision in Noisy Speech Using the Harmonic Model , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  I. Daubechies,et al.  Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool , 2011 .

[22]  Javier Ramírez,et al.  A new Kullback-Leibler VAD for speech recognition in noise , 2004, IEEE Signal Processing Letters.

[23]  Wei-Ping Zhu,et al.  An Approach for Voiced/Unvoiced Decision of Colored Noise-Corrupted Speech , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[24]  F. Ykhlef,et al.  Evaluation of time domain features for voiced/non-voiced classification of speech , 2012, 2012 International Conference on Signals and Electronic Systems (ICSES).

[25]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[26]  Pooja Jain,et al.  Marginal energy density over the low frequency range as a feature for voiced/non-voiced detection in noisy speech signals , 2013, J. Frankl. Inst..

[27]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Ram Bilas Pachori,et al.  egmentation of cardiac sound signals by removing murmurs using onstrained tunable-Q wavelet transform , 2013 .

[29]  E. Jafer,et al.  Wavelet-based voiced/unvoiced classification algorithm , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).

[30]  Gabriel Rilling,et al.  Empirical mode decomposition as a filter bank , 2004, IEEE Signal Processing Letters.

[31]  Pradip Sircar,et al.  Parametric modeling of speech by complex AM and FM signals , 2007, Digit. Signal Process..

[32]  R. Bracewell The Fourier Transform and Its Applications , 1966 .

[33]  Zhongwei Jiang,et al.  A cardiac sound characteristic waveform method for in-home heart disorder monitoring with electric stethoscope , 2006, Expert Syst. Appl..

[34]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[35]  Keikichi Hirose,et al.  Adaptive thresholding approach for robust voiced/unvoiced classification , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[36]  Pooja Jain,et al.  Time-Order Representation Based Method for Epoch Detection from Speech Signals , 2012, J. Intell. Syst..

[37]  Zhongwei Jiang,et al.  Comparison of envelope extraction algorithms for cardiac sound signal segmentation , 2008, Expert Syst. Appl..

[38]  Bayya Yegnanarayana,et al.  Voiced/Nonvoiced Detection Based on Robustness of Voiced Epochs , 2010, IEEE Signal Processing Letters.

[39]  C. Shahnaz,et al.  A bifeature voiced/unvoiced discrimination algorithm for speech signals in the presense of noise , 2007, 2007 IEEE Northeast Workshop on Circuits and Systems.

[40]  Pooja Jain,et al.  Event-Based Method for Instantaneous Fundamental Frequency Estimation from Voiced Speech Based on Eigenvalue Decomposition of the Hankel Matrix , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  W. Bastiaan Kleijn,et al.  Estimation of the Instantaneous Pitch of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[43]  Jaroslav Kautsky,et al.  Smoothed histogram modification for image processing , 1983, Comput. Vis. Graph. Image Process..

[44]  Philipos C. Loizou,et al.  Voiced/unvoiced speech discrimination in noise using Gabor atomic decomposition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[45]  Dimitri P. Bertsekas,et al.  Multiplier methods: A survey , 1975, at - Automatisierungstechnik.

[46]  Norden E. Huang,et al.  Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[47]  Fang Chen,et al.  Voiced/unvoiced pattern-based duration modeling for language identification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.