Automatic Detection of Depressive States from Speech

This paper investigates the acoustical and perceptual speech features that differentiate a depressed individual from a healthy one. The speech data gathered was a collection from both healthy and depressed subjects in the Italian language, each comprising of a read and spontaneous narrative. The pre-processing of this dataset was done using Mel Frequency Cepstral Coefficient (MFCC). The speech samples were further processed using Principal Component Analysis (PCA) for correlation and dimensionality reduction. It was found that both groups differed with respect to the extracted speech features. To distinguish the depressed group from the healthy one on the basis the proposed speech processing algorithm the Self Organizing Map (SOM) algorithm was used. The clustering accuracy given by SOM’s was 80.67%.

[1]  Carl Vogel,et al.  Needs and challenges in human computer interaction for processing social emotional information , 2015, Pattern Recognit. Lett..

[2]  Mauro Maldonato,et al.  Making Decisions under Uncertainty Emotions, Risk and Biases , 2015, Advances in Neural Networks.

[3]  Jafreezal Jaafar,et al.  FEATURE EXTRACTION USING MFCC , 2013 .

[4]  Mark Beale,et al.  Neural Network Toolbox™ User's Guide , 2015 .

[5]  Donatella Marazziti,et al.  Cognitive impairment in major depression. , 2010, European journal of pharmacology.

[6]  J. Edward Jackson,et al.  A User's Guide to Principal Components: Jackson/User's Guide to Principal Components , 2004 .

[7]  I. Jolliffe Principal Component Analysis , 2002 .

[8]  J. Peifer,et al.  Investigating the role of glottal features in classifying clinical depression , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[9]  Anna Esposito,et al.  Language Independent Detection Possibilities of Depression by Speech , 2016, Recent Advances in Nonlinear Speech Processing.

[10]  Anna Esposito,et al.  Mood Effects on the Decoding of Emotional Voices , 2013, WIRN.

[11]  Sunil Kumar Kopparapu,et al.  Choice of Mel filter bank in computing MFCC of a resampled speech , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[12]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[13]  E. Keogh,et al.  Technologically-assisted behaviour change: a systematic review of studies of novel technologies for the management of chronic illness , 2009, Journal of telemedicine and telecare.

[14]  Anna Esposito,et al.  On the recognition of emotional vocal expressions: motivations for a holistic approach , 2012, Cognitive Processing.

[15]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[16]  Luca D'Auria,et al.  Predictive Analysis of the Seismicity Level at Campi Flegrei Volcano Using a Data-Driven Approach , 2013, WIRN.

[17]  Ricardo Gutierrez-Osuna,et al.  A comparison of acoustic coding models for speech-driven facial animation , 2006, Speech Commun..

[18]  J. Mundt,et al.  Vocal Acoustic Biomarkers of Depression Severity and Treatment Response , 2012, Biological Psychiatry.

[19]  Anna Esposito,et al.  Assessing Voice User Interfaces: The vassist system prototype , 2014, 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom).

[20]  J. Mundt,et al.  Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology , 2007, Journal of Neurolinguistics.

[21]  Elliot Moore,et al.  Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech , 2008, IEEE Transactions on Biomedical Engineering.

[22]  Anna Esposito,et al.  On the Significance of Speech Pauses in Depressive Disorders: Results on Read and Spontaneous Narratives , 2016, Recent Advances in Nonlinear Speech Processing.

[23]  Lakhmi C. Jain,et al.  Modeling Social Signals and Contexts in Robotic Socially Believable Behaving Systems , 2016, Toward Robotic Socially Believable Behaving Systems.

[24]  M. Alpert,et al.  Reflections of depression in acoustic measures of the patient's speech. , 2001, Journal of affective disorders.

[25]  Luca D'Auria,et al.  Waveform Variation of the Explosion-Quakes as a Function of the Eruptive Activity at Stromboli Volcano , 2012, WIRN.