Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem

We developed algorithms to identify pregnant women with suicidal behavior using information extracted from clinical notes by natural language processing (NLP) in electronic medical records. Using both codified data and NLP applied to unstructured clinical notes, we first screened pregnant women in Partners HealthCare for suicidal behavior. Psychiatrists manually reviewed clinical charts to identify relevant features for suicidal behavior and to obtain gold-standard labels. Using the adaptive elastic net, we developed algorithms to classify suicidal behavior. We then validated algorithms in an independent validation dataset. From 275,843 women with codes related to pregnancy or delivery, 9331 women screened positive for suicidal behavior by either codified data (N = 196) or NLP (N = 9,145). Using expert-curated features, our algorithm achieved an area under the curve of 0.83. By setting a positive predictive value comparable to that of diagnostic codes related to suicidal behavior (0.71), we obtained a sensitivity of 0.34, specificity of 0.96, and negative predictive value of 0.83. The algorithm identified 1423 pregnant women with suicidal behavior among 9331 women screened positive. Mining unstructured clinical notes using NLP resulted in a 11-fold increase in the number of pregnant women identified with suicidal behavior, as compared to solely reliance on diagnostic codes.

[1]  G. Simon,et al.  Changes in Coding of Suicide Attempts or Self-Harm With Transition From ICD-9 to ICD-10. , 2017, Psychiatric services.

[2]  I. Kohane,et al.  Electronic medical records for discovery research in rheumatoid arthritis , 2010, Arthritis care & research.

[3]  M. Andover,et al.  The co-occurrence of non-suicidal self-injury and attempted suicide among adolescents: distinguishing risk factors and psychosocial correlates , 2012, Child and Adolescent Psychiatry and Mental Health.

[4]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[5]  D. Houry,et al.  Unrecognized suicidal ideation in ED patients: are we missing an opportunity? , 2008, The American journal of emergency medicine.

[6]  J. Pearson,et al.  Prevalence of suicidality during pregnancy and the postpartum , 2005, Archives of Women’s Mental Health.

[7]  Regina Barzilay,et al.  Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance. , 2019, JCO clinical cancer informatics.

[8]  P. Links,et al.  Do hospital E-codes consistently capture suicidal behaviour? , 2002, Chronic diseases in Canada.

[9]  Richard M Martin,et al.  Validation of suicide and self-harm records in the Clinical Practice Research Datalink , 2012, British journal of clinical pharmacology.

[10]  Margaret Oates,et al.  Suicide: the leading cause of maternal death , 2003, British Journal of Psychiatry.

[11]  T. Cai,et al.  Identification of subjects with polycystic ovary syndrome using electronic health records , 2015, Reproductive Biology and Endocrinology.

[12]  T. McCoy,et al.  Improving Prediction of Suicide and Accidental Death After Discharge From General Hospitals With Natural Language Processing. , 2016, JAMA psychiatry.

[13]  Steven H. Brown,et al.  Automated identification of postoperative complications within an electronic medical record using natural language processing. , 2011, JAMA.

[14]  Gustavo Turecki,et al.  Suicide and suicidal behaviour , 2016, The Lancet.

[15]  M. Nock,et al.  Suicide: Global Perspectives from the WHO World Mental Health Surveys , 2012 .

[16]  Mark Olfson,et al.  A systematic review of validated methods for identifying suicide or suicidal ideation using administrative or claims data , 2012, Pharmacoepidemiology and drug safety.

[17]  I. Kohane,et al.  Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts , 2015, PloS one.

[18]  M. Fava,et al.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model , 2011, Psychological Medicine.

[19]  I. Kohane,et al.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing , 2015, BMJ : British Medical Journal.

[20]  I. Jones Screening for Perinatal Depression , 2006, British Journal of Psychiatry.

[21]  Peter Szolovits,et al.  Modeling Disease Severity in Multiple Sclerosis Using Electronic Health Records , 2013, PloS one.

[22]  Lloyd H. Smith,et al.  Maternal and Neonatal Outcomes After Attempted Suicide , 2006, Obstetrics and gynecology.

[23]  David W. Bates,et al.  Use of electronic healthcare records to identify complex patients with atrial fibrillation for targeted intervention , 2016, J. Am. Medical Informatics Assoc..

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  Carol Friedman,et al.  Methods for Identifying Suicide or Suicidal Ideation in EHRs , 2012, AMIA.

[26]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[27]  M. Nock,et al.  Non-suicidal self-injury among adolescents: Diagnostic correlates and relation to suicide attempts , 2006, Psychiatry Research.

[28]  Fang Zhang,et al.  How complete are E‐codes in commercial plan claims databases? , 2014, Pharmacoepidemiology and drug safety.

[29]  Evan M. Kleiman,et al.  Risk Factors for Suicidal Thoughts and Behaviors: A Meta-Analysis of 50 Years of Research , 2017, Psychological bulletin.

[30]  Margaret Oates,et al.  Perinatal psychiatric disorders: a leading cause of maternal morbidity and mortality. , 2003, British medical bulletin.

[31]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[32]  Tianxi Cai,et al.  Suicidal behavior-related hospitalizations among pregnant women in the USA, 2006–2012 , 2016, Archives of Women's Mental Health.

[33]  Tianxi Cai,et al.  Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing , 2018, BMC Medical Informatics and Decision Making.

[34]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[35]  Barbara Stanley,et al.  Columbia Classification Algorithm of Suicide Assessment (C-CASA): classification of suicidal events in the FDA's pediatric suicidal risk analysis of antidepressants. , 2007, The American journal of psychiatry.

[36]  I. Kohane,et al.  Improving Case Definition of Crohn's Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing: A Novel Informatics Approach , 2013, Inflammatory bowel diseases.

[37]  Peter Szolovits,et al.  Surrogate-assisted feature extraction for high-throughput phenotyping , 2016, J. Am. Medical Informatics Assoc..

[38]  Helen Christensen,et al.  Changing the Direction of Suicide Prevention Research: A Necessity for True Population Impact. , 2016, JAMA psychiatry.

[39]  Matthew K Nock,et al.  The psychology of suicidal behaviour. , 2014, The lancet. Psychiatry.

[40]  Evan M. Kleiman,et al.  Letter to the Editor: Suicide as a complex classification problem: machine learning and related techniques can advance suicide prediction - a reply to Roaldset (2016) , 2016, Psychological Medicine.

[41]  Enrique Baca-García,et al.  Novel Use of Natural Language Processing (NLP) to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid , 2016, Comput. Math. Methods Medicine.

[42]  Elias Brandt,et al.  Monitoring Suicidal Patients in Primary Care Using Electronic Health Records , 2015, The Journal of the American Board of Family Medicine.

[43]  Colin G. Walsh,et al.  Predicting Risk of Suicide Attempts Over Time Through Machine Learning , 2017 .

[44]  Tianxi Cai,et al.  Large-scale identification of patients with cerebral aneurysms using natural language processing , 2016, Neurology.

[45]  M. Silverman The language of suicidology. , 2006, Suicide & life-threatening behavior.

[46]  J. Xuan,et al.  Classification algorithms for phenotype prediction in genomics and proteomics. , 2008, Frontiers in bioscience : a journal and virtual library.

[47]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[48]  Nemanja Vaci,et al.  Identifying Predictors of Suicide in Severe Mental Illness: A Feasibility Study of a Clinical Prediction Rule (Oxford Mental Illness and Suicide Tool or OxMIS) , 2020, Frontiers in Psychiatry.

[49]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[50]  E. Baca-García,et al.  Suicidal behavior disorder as a diagnostic entity in the DSM‐5 classification system: advantages outweigh limitations , 2014, World psychiatry : official journal of the World Psychiatric Association.

[51]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[52]  T. Cai,et al.  Adverse obstetric outcomes during delivery hospitalizations complicated by suicidal behavior among US pregnant women , 2018, PloS one.

[53]  Matthew K Nock,et al.  Predicting Suicidal Behavior From Longitudinal Electronic Health Records. , 2017, The American journal of psychiatry.

[54]  Brenda C T Kieboom,et al.  Objectives, design and main findings until 2020 from the Rotterdam Study , 2020, European Journal of Epidemiology.