Perceptually Enhanced Single Frequency Filtering for Dysarthric Speech Detection and Intelligibility Assessment

This paper proposes a new speech feature representation that improves the intelligibility assessment of dysarthric speech. The formulation of the feature set is motivated from the human auditory perception and high time-frequency resolution property of single frequency filtering (SFF) technique. The proposed features are named as perceptually enhanced single frequency cepstral coefficients (PE-SFCC). As a part of SFF technique implementation, speech signal passed through a single pole complex bandpass filter bank to obtain high-resolution time-frequency distribution. Then, the distribution is enhanced by using a set of auditory perceptual operators. Lastly, traditional homomorphic analysis has been carried out on the resulting signal to obtain PE-SFCC feature vector. The performance of proposed features in dysarthric speech detection and its intelligibility assessment has been reported on UASPEECH database. The PE-SFCC features outperformed the state-of-the-art features in dysarthric speech detection and intelligibility assessment.

[1]  Elmar Nöth,et al.  PEAKS - A system for the automatic evaluation of voice and speech disorders , 2009, Speech Commun..

[2]  C. Boliek,et al.  Intensive voice treatment (LSVT LOUD) for children with spastic cerebral palsy and dysarthria. , 2012, Journal of speech, language, and hearing research : JSLHR.

[3]  Fraser Shein,et al.  Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility , 2012, Speech Commun..

[4]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[5]  Bayya Yegnanarayana,et al.  Epoch extraction from emotional speech using single frequency filtering approach , 2017, Speech Commun..

[6]  Naveen Kumar,et al.  Automatic intelligibility classification of sentence-level pathological speech , 2015, Comput. Speech Lang..

[7]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Bayya Yegnanarayana,et al.  Single Frequency Filtering Approach for Discriminating Speech and Nonspeech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Ahmed Hammouch,et al.  Discriminating Between Patients With Parkinson’s and Neurological Diseases Using Cepstral Analysis , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[10]  Elliot Moore,et al.  Cross-Database Models for the Classification of Dysarthria Presence , 2017, INTERSPEECH.

[11]  Heidi Christensen,et al.  Intelligibility Assessment and Speech Recognizer Word Accuracy Rate Prediction for Dysarthric Speakers in a Factor Analysis Subspace , 2015, ACM Trans. Access. Comput..

[12]  D. Thomson,et al.  Spectrum estimation and harmonic analysis , 1982, Proceedings of the IEEE.

[13]  Sunil Kumar Kopparapu,et al.  Automatic assessment of dysarthria severity level using audio descriptors , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Suryakanth V. Gangashetty,et al.  Detection of Replay Attacks Using Single Frequency Filtering Cepstral Coefficients , 2017, INTERSPEECH.

[15]  Marie Klopfenstein,et al.  Interaction between prosody and intelligibility , 2009 .

[16]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[17]  Frank Rudzicz,et al.  Phonological features in discriminative classification of dysarthric speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Julie M. Liss,et al.  Discriminating dysarthria type and predicting intelligibility from amplitude modulation spectra. , 2009 .

[19]  J. Liss,et al.  Discriminating dysarthria type from envelope modulation spectra. , 2010, Journal of speech, language, and hearing research : JSLHR.

[20]  Paavo Alku,et al.  Dysarthric Speech Classification Using Glottal Features Computed from Non-words, Words and Sentences , 2018, INTERSPEECH.

[21]  Sanjit K. Mitra,et al.  Warped discrete-Fourier transform: Theory and applications , 2001 .

[22]  Visar Berisha,et al.  Modeling pathological speech perception from data with similarity labels , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Stephen Wilson,et al.  Assessing disordered speech and voice in Parkinson's disease: a telerehabilitation application. , 2010, International journal of language & communication disorders.

[24]  J. Martens,et al.  Speech technology-based assessment of phoneme intelligibility in dysarthria. , 2009, International journal of language & communication disorders.

[25]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.