Automatic classification of RDoC positive valence severity with a neural network.

OBJECTIVE Our objective was to develop a machine learning-based system to determine the severity of Positive Valance symptoms for a patient, based on information included in their initial psychiatric evaluation. Severity was rated on an ordinal scale of 0-3 as follows: 0 (absent=no symptoms), 1 (mild=modest significance), 2 (moderate=requires treatment), 3 (severe=causes substantial impairment) by experts. MATERIALS AND METHODS We treated the task of assigning Positive Valence severity as a text classification problem. During development, we experimented with regularized multinomial logistic regression classifiers, gradient boosted trees, and feedforward, fully-connected neural networks. We found both regularization and feature selection via mutual information to be very important in preventing models from overfitting the data. Our best configuration was a neural network with three fully connected hidden layers with rectified linear unit activations. RESULTS Our best performing system achieved a score of 77.86%. The evaluation metric is an inverse normalization of the Mean Absolute Error presented as a percentage number between 0 and 100, where 100 means the highest performance. Error analysis showed that 90% of the system errors involved neighboring severity categories. CONCLUSION Machine learning text classification techniques with feature selection can be trained to recognize broad differences in Positive Valence symptom severity with a modest amount of training data (in this case 600 documents, 167 of which were unannotated). An increase in the amount of annotated data can increase accuracy of symptom severity classification by several percentage points. Additional features and/or a larger training corpus may further improve accuracy.

[1]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[4]  M. Fava,et al.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model , 2011, Psychological Medicine.

[5]  T. Insel,et al.  Toward the future of psychiatric diagnosis: the seven pillars of RDoC , 2013, BMC Medicine.

[6]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[7]  Sarah E. Morris,et al.  Research Domain Criteria: cognitive systems, neural circuits, and dimensions of behavior , 2012, Dialogues in clinical neuroscience.

[8]  Sharon C. Lyter,et al.  Diagnostic and Statistical Manual of Mental Disorders: Making it Work for Social Work , 2012 .

[9]  A. Leenaars,et al.  Suicide Note Classification Using Natural Language Processing: A Content Analysis , 2010, Biomedical informatics insights.

[10]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11]  Enrique Baca-García,et al.  Novel Use of Natural Language Processing (NLP) to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid , 2016, Comput. Math. Methods Medicine.

[12]  Imre Solti1,et al.  Automated classification of radiology reports for acute lung injury: Comparison of keyword and machine learning based natural language processing approaches , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop.

[13]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[14]  Özlem Uzuner,et al.  Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID shared tasks Track 2. , 2017, Journal of biomedical informatics.

[15]  Vitaly Herasevich,et al.  Validation of an electronic surveillance system for acute lung injury , 2009, Intensive Care Medicine.

[16]  Cosmin Adrian Bejan,et al.  Identification of Patients with Acute Lung Injury from Free-Text Chest X-Ray Reports , 2013, BioNLP@ACL.

[17]  Paul N. Lanken,et al.  Research Paper: Validation Study of an Automated Electronic Acute Lung Injury Screening Tool , 2009, J. Am. Medical Informatics Assoc..

[18]  Brian Wilson,et al.  Case Report: Identifying Smokers with a Medical Extraction System , 2008, J. Am. Medical Informatics Assoc..

[19]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[20]  K. Bretonnel Cohen,et al.  Sentiment Analysis of Suicide Notes: A Shared Task , 2012, Biomedical informatics insights.

[21]  Adam Wright,et al.  Use of a support vector machine for categorizing free-text notes: assessment of accuracy across two institutions , 2013, J. Am. Medical Informatics Assoc..

[22]  I. Kohane,et al.  Electronic medical records for discovery research in rheumatoid arthritis , 2010, Arthritis care & research.

[23]  T. Insel,et al.  Wesleyan University From the SelectedWorks of Charles A . Sanislow , Ph . D . 2010 Research Domain Criteria ( RDoC ) : Toward a New Classification Framework for Research on Mental Disorders , 2018 .

[24]  John Pestian,et al.  Using Natural Language Processing to Classify Suicide Notes , 2008, BioNLP.

[25]  S Velupillai,et al.  Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis , 2015, Yearbook of Medical Informatics.