Voiceprint recognition of Parkinson patients based on deep learning

More than 90% of the Parkinson Disease (PD) patients suffer from vocal disorders. Speech impairment is already indicator of PD. This study focuses on PD diagnosis through voiceprint features. In this paper, a method based on Deep Neural Network (DNN) recognition and classification combined with Mini-Batch Gradient Descent (MBGD) is proposed to distinguish PD patients from healthy people using voiceprint features. In order to exact the voiceprint features from patients, Weighted Mel Frequency Cepstrum Coefficients (WMFCC) is applied. The proposed method is tested on experimental data obtained by the voice recordings of three sustained vowels /a/, /o/ and /u/ from participants (48 PD and 20 healthy people). The results show that the proposed method achieves a high accuracy of diagnosis of PD patients from healthy people, than the conventional methods like Support Vector Machine (SVM) and other mentioned in this paper. The accuracy achieved is 89.5%. WMFCC approach can solve the problem that the high-order cepstrum coefficients are small and the features component's representation ability to the audio is weak. MBGD reduces the computational loads of the loss function, and increases the training speed of the system. DNN classifier enhances the classification ability of voiceprint features. Therefore, the above approaches can provide a solid solution for the quick auxiliary diagnosis of PD in early stage.

[1]  Man-Wai Mak,et al.  DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Jen-Tzung Chien,et al.  Deep neural network driven mixture of PLDA for robust i-vector speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[3]  Ahmed Hammouch,et al.  Discriminating Between Patients With Parkinson’s and Neurological Diseases Using Cepstral Analysis , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[4]  Li Jin,et al.  An Improved Speech Endpoint Detection Based on Spectral Subtraction and Adaptive Sub-band Spectral Entropy , 2010, 2010 International Conference on Intelligent Computation Technology and Automation.

[5]  Francisco Herrera,et al.  MC2ESVM: Multiclass Classification Based on Cooperative Evolution of Support Vector Machines , 2018, IEEE Computational Intelligence Magazine.

[6]  Jungho Im,et al.  Support vector machines in remote sensing: A review , 2011 .

[7]  S. Selva Nidhyananthan,et al.  A review on speech enhancement algorithms and why to combine with environment classification , 2014 .

[8]  Yong Feng,et al.  Evaluation of the deep nonlinear metric learning based speaker identification on the large scale of voiceprint corpus , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[9]  Elvira Sukma Wahyuni,et al.  Arabic speech recognition using MFCC feature extraction and ANN classification , 2017, 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE).

[10]  Fikret S. Gürgen,et al.  Collection and Analysis of a Parkinson Speech Dataset With Multiple Types of Sound Recordings , 2013, IEEE Journal of Biomedical and Health Informatics.

[11]  K. Uma Rani,et al.  Automatic detection of neurological disordered voices using mel cepstral coefficients and neural networks , 2013, 2013 IEEE Point-of-Care Healthcare Technologies (PHT).

[12]  P. Brundin,et al.  Biochemical Profiling of the Brain and Blood Metabolome in a Mouse Model of Prodromal Parkinson's Disease Reveals Distinct Metabolic Profiles. , 2018, Journal of proteome research.

[13]  Zhong Li,et al.  A classifier of satellite signals based on the back-propagation neural network , 2015, 2015 8th International Congress on Image and Signal Processing (CISP).

[14]  Jinkou Ding,et al.  The hidden layer design for staked denoising autoencoder , 2015, 2015 12th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP).

[15]  Ahmed Hammouch,et al.  Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people , 2016, International Journal of Speech Technology.

[16]  Y. Radhika,et al.  A Novel Approach for Speaker Recognition by Using Wavelet Analysis and Support Vector Machines , 2016 .

[17]  A. Benba,et al.  Voiceprint analysis using Perceptual Linear Prediction and Support Vector Machines for detecting persons with Parkinson ’ s disease , 2014 .

[18]  A. Benba,et al.  Voiceprints analysis using MFCC and SVM for detecting patients with Parkinson's disease , 2015, 2015 International Conference on Electrical and Information Technologies (ICEIT).

[19]  Max A. Little,et al.  A Parametric Approach for Classification of Distortions in Pathological Voices , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[21]  Ahmed Hammouch,et al.  Using RASTA-PLP for discriminating between different Neurological diseases , 2016, 2016 International Conference on Electrical and Information Technologies (ICEIT).

[22]  Ahmed Hammouch,et al.  Using Human Factor Cepstral Coefficient on Multiple Types of Voice Recordings for Detecting Patients with Parkinson's Disease , 2017 .

[23]  Deepali Malewadi,et al.  Development of Speech recognition technique for Marathi numerals using MFCC & LFZI algorithm , 2016, 2016 International Conference on Computing Communication Control and automation (ICCUBEA).

[24]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[25]  Jinfeng Yi,et al.  Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD) , 2013, Machine Learning.

[26]  Yanning Zhang,et al.  Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.