A deep transfer learning approach for improved post-traumatic stress disorder diagnosis

Post-traumatic stress disorder (PTSD) is a traumatic-stressor-related disorder developed by exposure to a traumatic or adverse environmental event that caused serious harm or injury. Structured interview is the only widely accepted clinical practice for PTSD diagnosis but suffers from several limitations including the stigma associated with the disease. Diagnosis of PTSD patients by analyzing speech signals has been investigated as an alternative since recent years, where speech signals are processed to extract frequency features and these features are then fed into a classification model for PTSD diagnosis. In this paper, we developed a deep belief network (DBN) model combined with a transfer learning (TL) strategy for PTSD diagnosis. We computed three categories of speech features and utilized the DBN model to fuse these features. The TL strategy was utilized to transfer knowledge learned from a large speech recognition database, TIMIT, for PTSD detection where PTSD patient data are difficult to collect. We evaluated the proposed methods on two PTSD speech databases, each of which consists of audio recordings from 26 patients. We compared the proposed methods with other popular methods and showed that the state-of-the-art support vector machine (SVM) classifier only achieved an accuracy of 57.68%, and TL strategy boosted the performance of the DBN from 61.53 to 74.99%. Altogether, our method provides a pragmatic and promising tool for PTSD diagnosis. Preliminary results of this study were presented in Banerjee (in: 2017 IEEE international conference on data mining (ICDM), IEEE, 2017).

[1]  A. Statnikov,et al.  Bridging a translational gap: using machine learning to improve the prediction of PTSD , 2015, BMC Psychiatry.

[2]  Jiang Li,et al.  A Deep Transfer Learning Approach for Improved Post-Traumatic Stress Disorder Diagnosis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[3]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4]  Mireia Farrús,et al.  Jitter and shimmer measurements for speaker recognition , 2007, INTERSPEECH.

[5]  Benjamin Schrauwen,et al.  End-to-end learning for music audio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Michael G. Kenny,et al.  The Harmony of Illusions: Inventing Post-Traumatic Stress Disorder. , 1997 .

[7]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Louis-Philippe Morency,et al.  Investigating voice quality as a speaker-independent indicator of depression and PTSD , 2013, INTERSPEECH.

[10]  A. Statnikov,et al.  Utilization of machine learning for prediction of post-traumatic stress: a re-examination of cortisol in the prediction and pathways to non-remitting PTSD , 2017, Translational Psychiatry.

[11]  A. Statnikov,et al.  Quantitative forecasting of PTSD from early trauma responses: a Machine Learning application. , 2014, Journal of psychiatric research.

[12]  John H. L. Hansen,et al.  Robust Emotional Stressed Speech Detection Using Weighted Frequency Subbands , 2011, EURASIP J. Adv. Signal Process..

[13]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[14]  Sriram Ramaswamy,et al.  A primary care perspective of posttraumatic stress disorder for the Department of Veterans Affairs. , 2005, Primary care companion to the Journal of clinical psychiatry.

[15]  Dimitra Vergyri,et al.  Speech-based assessment of PTSD in a military population using diverse feature classes , 2015, INTERSPEECH.

[16]  Jieping Ye,et al.  Deep Model Based Transfer and Multi-Task Learning for Biological Image Analysis , 2015, IEEE Transactions on Big Data.

[17]  Shashidhar G. Koolagudi,et al.  Characterization and recognition of emotions from speech using excitation source information , 2013, Int. J. Speech Technol..

[18]  E. V. D. Broek,et al.  Telling the Story and Re-Living the Past: How Speech Analysis Can Reveal Emotions in Post-traumatic Stress Disorder (PTSD) Patients , 2010 .

[19]  J Douglas Bremner,et al.  Post-traumatic stress disorder and memory: prescient medicolegal testimony at the International War Crimes Tribunal? , 2005, The journal of the American Academy of Psychiatry and the Law.

[20]  Albert A. Rizzo,et al.  Self-Reported Symptoms of Depression and PTSD Are Associated with Reduced Vowel Space in Screening Interviews , 2016, IEEE Transactions on Affective Computing.

[21]  Henny-Annie Bijleveld,et al.  Post-traumatic Stress Disorder and Stuttering: A Diagnostic Challenge in a Case Study☆ , 2015 .

[22]  A. L. Edwards Note on the “correction for continuity” in testing the significance of the difference between correlated proportions , 1948, Psychometrika.

[23]  Julius Kunze,et al.  Transfer Learning for Speech Recognition on a Budget , 2017, Rep4NLP@ACL.

[24]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  F. Supek,et al.  Posttraumatic stress disorder: diagnostic data analysis by data mining methodology. , 2007, Croatian medical journal.

[26]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[27]  B. Rothbaum,et al.  Behavioral/cognitive conceptualizations of post-traumatic stress disorder , 1989 .

[28]  R. Pitman,et al.  Post-traumatic stress disorder, hormones, and memory , 1989, Biological Psychiatry.

[29]  Bradley D Grinage Diagnosis and management of post-traumatic stress disorder. , 2003, American family physician.

[30]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[31]  Junran Zhang,et al.  Multimodal MRI-Based Classification of Trauma Survivors with and without Post-Traumatic Stress Disorder , 2016, Front. Neurosci..

[32]  Xiaodan Zhuang,et al.  Improving speech-based PTSD detection via multi-view learning , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[33]  Jennifer G. Dy,et al.  A Sparse Combined Regression-Classification Formulation for Learning a Physiological Alternative to Clinical Post-Traumatic Stress Disorder Scores , 2015, AAAI.

[34]  Xi Li,et al.  Stress and Emotion Classification using Jitter and Shimmer Features , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[35]  Shotaro Akaho,et al.  TrBagg: A Simple Transfer Learning Method and its Application to Personalization in Collaborative Tagging , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[36]  H. M. van der Ploeg,et al.  The assessment of posttraumatic stress disorder: with the Clinician Administered PTSD Scale: Dutch results. , 1994, Journal of clinical psychology.

[37]  Ekin Ekinci,et al.  An alternative evaluation of post traumatic stress disorder with machine learning methods , 2015, 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA).

[38]  Ji-Hwan Kim,et al.  The use of prosody in a combined system for punctuation generation and speech recognition , 2001, INTERSPEECH.

[39]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Rafael A. Calvo,et al.  Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[41]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[42]  Sandeep Sharma,et al.  Comparative Analysis of LPCC, MFCC and BFCC for the Recognition of Hindi Words using Artificial Neural Networks , 2014 .

[43]  Kristian Kersting,et al.  Transfer Learning via Relational Type Matching , 2015, 2015 IEEE International Conference on Data Mining.

[44]  Amit Srivastava,et al.  Multi-modal prediction of PTSD and stress indicators , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[46]  R. Kessler,et al.  How well can post‐traumatic stress disorder be predicted from pre‐trauma risk factors? An exploratory study in the WHO World Mental Health Surveys , 2014, World psychiatry : official journal of the World Psychiatric Association.

[47]  Nicholas B. Allen,et al.  Early prediction of major depression in adolescents using glottal wave characteristics and Teager Energy parameters , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[49]  Julia Hirschberg,et al.  Predicting Automatic Speech Recognition Performance Using Prosodic Cues , 2000, ANLP.