Dysarthric speaker identification with constrained training durations

Dysarthria is a neurological speech disorder that induces badly or no pronunciation of phonemes. In order to promote biometric identification of dysarthic speakers under constrained training scenario, we propose in this paper a recognition framework based on the score level fusion of two systems: The first is based on the classical Mel Frequency Cepstral Coefficients (MFCCs) while the second system uses Auditory Cues (ACs) which simulate the external, middle and inner parts of the ear. A simple energy based voice activity detector (VAD) is incorporated in both systems and its impact on performance is evaluated. The experimental investigations are accomplished using Nemours database and Torgo database and Gaussian Mixture Models (GMMs) for speaker modeling. The experimental results demonstrate the effectiveness of the energy based VAD, especially for the MFCC-based system. Moreover, the complementarity of the two features is manifested by a significant gain in identification performance of the fused system under different training durations. Interestingly, the proposed system surpasses the state of the art results and achieves 100% correct speaker identification under long duration training scenario.

[1]  Sid-Ahmed Selouani Speech Processing and Soft Computing , 2011, Springer Briefs in Electrical and Computer Engineering.

[2]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[3]  Jun Ren,et al.  An Automatic Dysarthric Speech Recognition Approach using Deep Neural Networks , 2017 .

[4]  Sid-Ahmed Selouani,et al.  Incorporating Phonetic Knowledge Into an Evolutionary Subspace Approach for Robust Speech Recognition , 2007 .

[5]  Frank RudziczAravind The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2012 .

[6]  Sid-Ahmed Selouani,et al.  Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge , 2016 .

[7]  D. O'Shaughnessy,et al.  Improving dysarthric speech intelligibility through re-synthesized and grafted units , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[8]  Juan Ignacio Godino-Llorente,et al.  Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson's Disease , 2018, Appl. Soft Comput..

[9]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[10]  Sid-Ahmed Selouani,et al.  Native and non-native class discrimination using speech rhythm- and auditory-based cues , 2015, Comput. Speech Lang..

[11]  A. Carson,et al.  Functional Neurologic Disorders , 2015, Continuum.

[12]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[14]  Mohammad Ali Keyvanrad,et al.  Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks , 2018, ETRI Journal.

[15]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[16]  Seyed Reza Shahamiri,et al.  Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach , 2014, Adv. Eng. Informatics.

[17]  Jean Caelen Space/time data-information in the A.R.I.A.L. project ear model , 1985, Speech Commun..

[18]  Tomi Kinnunen,et al.  A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  J. Duffy Functional speech disorders: clinical manifestations, diagnosis, and management. , 2016, Handbook of clinical neurology.

[20]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[21]  R. Gutierrez-Osuna,et al.  Automated speech analysis tools for children’s speech production: A systematic literature review , 2018, International journal of speech-language pathology.

[22]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[23]  M. Cheriet,et al.  Score Fusion of SVD and DCT-RLDA for Face Recognition , 2008, 2008 First Workshops on Image Processing Theory, Tools and Applications.