Clustering based voiced-unvoiced-silence detection in speech using temporal and spectral parameters

This paper reports automatic segmentation of voiced, unvoiced and silence portion of speech on TIM IT data base. Waveform and frequency domain parameters are used to form multi dimensional feature space. Short time energy threshold of unvoiced segment is used to separate out silence or background from speech. The Gaussian similarity function based spectral clustering is used to find error performance of voiced/unvoiced (V/UV) portion of the speech. The classification accuracy of V/UV is measured and the result is compared with the other techniques available in the literatures. The proposed technique provides at least 98.3% V/UV detection accuracy.

[1]  Bobby R. Hunt,et al.  Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier , 1993, IEEE Trans. Speech Audio Process..

[2]  Nishu Sharma,et al.  A Comparative Study Of Data Clustering Techniques , 2013 .

[3]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[4]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[5]  Bayya Yegnanarayana,et al.  Voiced/Nonvoiced Detection Based on Robustness of Voiced Epochs , 2010, IEEE Signal Processing Letters.

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[8]  Takao Kobayashi,et al.  Voiced/unvoiced determination of speech signal in noisy environment using harmonicity measure based on instantaneous frequency , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Wei-Ping Zhu,et al.  A multifeature voiced/unvoiced decision algorithm for noisy speech , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[10]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  P. V. de Souza,et al.  A statistical approach to the design of an adaptive self-normalizing silence detector , 1983 .

[12]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[13]  Douglas D. O'Shaughnessy,et al.  Voiced-Unvoiced-Silence Speech Sound Classification Based on Unsupervised Learning , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[14]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[15]  Israel Cohen,et al.  Voice Activity Detection in Presence of Transient Noise Using Spectral Clustering , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Keikichi Hirose,et al.  Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD , 2016, Pattern Analysis and Applications.

[17]  Yan Liu,et al.  A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[18]  W. Bastiaan Kleijn Principles of Speech Coding , 2008 .