Improving the intelligibility of dysarthric speech by modifying system parameters, retaining speaker's identity

Dysarthria is a neuromotor impairment of speech that affects one or more subsystems involved in speech production. Such impairment is reflected in the acoustic characteristics of phonemes uttered by a dysarthric speaker. If such a speaker suffers from laryngeal dysfunction and improper articulation, then he/she may not be able to utter some/most of the phonemes properly. In our work, from the utterance of a dysarthric speaker, the poorly uttered phonemes are located and replaced with that of the normal speaker's speech signal. However, the resultant speech signal after concatenation doesn't sound natural due to the discontinuities, at the concatenation points in short-term energy, pitch period, and formant contour. In our work, the discontinuity at the concatenation point, in the short-term energy function is handled by smoothening the short-term energy of few frames before and after the concatenation point. Since, the pitch period in the replaced segment (phoneme) is considerably different from the dysarthric speaker's pitch period, the pitch period is adjusted to resemble the dysarthric speaker. The quality and naturalness of the utterance, after pitch modification, are considerably increased. The discontinuity in the formant contour is due to the reason that the co-articulation effect is absent since the replaced unit is taken from a different context. From the linear prediction analysis, the pole locations and their corresponding radii are adjusted based on the pole locations of adjacent phonemes. The quality and naturalness of speech signal, after all the three modifications, are found to be very close to the natural speech.

[1]  D. O'Shaughnessy,et al.  Speech assistive technology to improve the interaction of dysarthric speakers with machines , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[2]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[3]  Mike Brookes,et al.  A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  T. Nagarajan,et al.  Pole-focused linear prediction-based spectrogram for coarticulation analysis , 2010, 2010 IEEE Students Technology Symposium (TechSym).

[5]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[6]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  D. M. Etter,et al.  Collaborative DSP education using the Internet and MATLAB(002) 19951127 , 1995 .

[8]  P. Vijayalakshmi,et al.  Assessment of laryngeal dysfunctions of dysarthric speakers. , 2009, Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference.