Identifying First Episodes of Psychosis in Psychiatric Patient Records using Machine Learning

Natural language processing is being pressed into use to facilitate the selection of cases for medical research in electronic health record databases, though study inclusion criteria may be complex, and the linguistic cues indicating eligibility may be subtle. Finding cases of first episode psychosis raised a number of problems for automated approaches, providing an opportunity to explore how machine learning technologies might be used to overcome them. A system was delivered that achieved an AUC of 0.85, enabling 95% of relevant cases to be identified whilst halving the work required in manually reviewing cases. The techniques that made this possible are presented.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Alisha R Pollastri,et al.  Validation of electronic health record phenotyping of bipolar disorder cases and controls. , 2015, The American journal of psychiatry.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Nigam H. Shah,et al.  Toward personalizing treatment for depression: predicting diagnosis and severity , 2014, J. Am. Medical Informatics Assoc..

[6]  David A. Hanauer,et al.  Enhanced identification of eligibility for depression research using an electronic medical record search engine , 2009, Int. J. Medical Informatics.

[7]  Trevor A. Hurwitz,et al.  The Schedules for Clinical Assessment in Neuropsychiatry. Version 2 , 1996 .

[8]  M. Fava,et al.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model , 2011, Psychological Medicine.

[9]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[10]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[11]  François Bourque A mixed methods study of the relation between migration, ethnicity and psychosis , 2018 .

[12]  Hua Xu,et al.  Data from clinical notes: a perspective on the tension between structure and flexible documentation , 2011, J. Am. Medical Informatics Assoc..

[13]  Paola Dazzan,et al.  Heterogeneity in incidence rates of schizophrenia and other psychotic syndromes: findings from the 3-center AeSOP study. , 2006, Archives of general psychiatry.

[14]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[15]  Carlos Iribarren,et al.  Validation of Electronic Health Records for the Assessment of Statin Dosing In Research , 2017 .

[16]  L. Tennakoon,et al.  Challenges in recruitment of research participants , 2003 .

[17]  J. Waddington,et al.  Epidemiology of first-episode psychosis: illustrating the challenges across diagnostic boundaries through the Cavan-Monaghan study at 8 years. , 2005, Schizophrenia bulletin.

[18]  Cosmin Adrian Bejan,et al.  Identification of Patients with Acute Lung Injury from Free-Text Chest X-Ray Reports , 2013, BioNLP@ACL.

[19]  Tim Weaver,et al.  Influences on recruitment to randomised controlled trials in mental health settings in England: a national cross-sectional survey of researchers working for the Mental Health Research Network , 2014, BMC Medical Research Methodology.

[20]  Donia Scott,et al.  Extracting information from the text of electronic medical records to improve case detection: a systematic review , 2016, J. Am. Medical Informatics Assoc..

[21]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[22]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[23]  F. R. Rosendaal,et al.  Direct comparison of first-contact versus longitudinal register-based case finding in the same population: early evidence that the incidence of schizophrenia may be three times higher than commonly reported , 2014, Psychological Medicine.

[24]  D. Swinglehurst,et al.  Tensions and paradoxes in electronic patient record research: a systematic literature review using the meta-narrative method. , 2009, The Milbank quarterly.

[25]  Jeremy C Wyatt,et al.  Opportunities for and challenges of computerisation , 1998, The Lancet.

[26]  Donia Scott,et al.  Corpus Annotation as a Scientific Task , 2012, LREC.

[27]  S. Duvall,et al.  Automated identification of patients with a diagnosis of binge eating disorder from narrative electronic health records. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[28]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[29]  Graham Leask,et al.  Getting more out , 2004 .