Improved Silence-Unvoiced-Voiced (SUV) Segmentation for Dysarthric Speech Signals using Linear Prediction Error Variance

A novel algorithm for the segmentation of dysarthric speech into silence, unvoiced and voiced (SUV) segments is presented. The proposed algorithm is based on the combination of short-time energy (STE), zero-crossing rate (ZCR) and linear prediction error variance (LPEV) or the segmentation problem. Extending the previous work in this field, the proposed method will address the difficulties in distinguishing between voiced and unvoiced segments in dysarthric speech. More precisely, the error variance of the linear prediction coefficients will be used to design a three-fold decision matrix that can accommodate the high variability in loudness experienced in dysarthric speech. In addition, a moving average threshold approach will be proposed in order to provide an “as-fit” segmentation technique that is fully automated and that will be able to handle highly severe dysarthric speech with varying loudness and ZCRs. The ability of the proposed fully-automated algorithm will be validated using real speech samples from healthy speakers, and speakers with ataxic dysarthria. The results of the proposed approach are compared with known methods using STE and ZCR. It is observed that the proposed classification method does not only show an improvement in segmentation performance but also provides consistent results in low signal energy situations.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  Sherif M. Abdou,et al.  A Review: Automatic Speech Segmentation , 2017 .

[3]  R. A. Carrasco,et al.  A comparison of speech feature extraction employing autonomous neural network topologies , 1991 .

[4]  Mark A. Greenwood,et al.  SUVING: AUTOMATIC SILENCE /UNVOICED/VOICED CLASSIFICATION OF SPEECH , 1999 .

[5]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[6]  Dhany Arifianto,et al.  Dual Parameters for Voiced-Unvoiced Speech Signal Determination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Hrudaya Kumar Tripathy,et al.  Unalike methodologies of feature extraction & feature matching in Speech Recognition , 2014, 2014 International Conference on High Performance Computing and Applications (ICHPCA).

[8]  K. Kavanagh,et al.  Phonological markers of sentence stress in ataxic dysarthria and their relationship to perceptual cues. , 2014, Journal of communication disorders.

[9]  Edward Półrolniczak,et al.  Analysis of the dependencies between parameters of the voice at the context of the succession of sung vowels , 2016, 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).

[10]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[11]  Shashidhar G Koolagudi,et al.  Recognition and Classification of Pauses in Stuttered Speech Using Acoustic Features , 2019, 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN).

[12]  Yin Yin Aye Speech Recognition Using Zero-Crossing Features , 2009, 2009 International Conference on Electronic Computer Technology.

[13]  Michael T. Johnson,et al.  Sensorimotor adaptation of speech using real-time articulatory resynthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Cai Yu,et al.  Voice activity detection based on short-time energy and noise spectrum adaptation , 2002, 6th International Conference on Signal Processing, 2002..

[15]  G. Bachur,et al.  1 Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal , 2008 .

[16]  Faran Awais Butt,et al.  Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals , 2013, 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE).

[17]  Douglas D. O'Shaughnessy,et al.  Voiced-Unvoiced-Silence Speech Sound Classification Based on Unsupervised Learning , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[18]  Abhirup Das Barman,et al.  Clustering based voiced-unvoiced-silence detection in speech using temporal and spectral parameters , 2015, 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).

[19]  Hakan Tora,et al.  The use of cumulants for voiced-unvoiced segments identification in speech signals , 2014, 2014 22nd Signal Processing and Communications Applications Conference (SIU).

[20]  Buket D. Barkana,et al.  Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy , 2008, SCSS.

[21]  John J. Soraghan,et al.  Automatic detection of speech disorder in dysarthria using extended speech feature extraction and neural networks classification , 2017 .