The Potential of Research Drawing on Clinical Free Text to Bring Benefits to Patients in the United Kingdom: A Systematic Review of the Literature

Background: The analysis of clinical free text from patient records for research has potential to contribute to the medical evidence base but access to clinical free text is frequently denied by data custodians who perceive that the privacy risks of data-sharing are too high. Engagement activities with patients and regulators, where views on the sharing of clinical free text data for research have been discussed, have identified that stakeholders would like to understand the potential clinical benefits that could be achieved if access to free text for clinical research were improved. We aimed to systematically review all UK research studies which used clinical free text and report direct or potential benefits to patients, synthesizing possible benefits into an easy to communicate taxonomy for public engagement and policy discussions. Methods: We conducted a systematic search for articles which reported primary research using clinical free text, drawn from UK health record databases, which reported a benefit or potential benefit for patients, actionable in a clinical environment or health service, and not solely methods development or data quality improvement. We screened eligible papers and thematically analyzed information about clinical benefits reported in the paper to create a taxonomy of benefits. Results: We identified 43 papers and derived five themes of benefits: health-care quality or services improvement, observational risk factor-outcome research, drug prescribing safety, case-finding for clinical trials, and development of clinical decision support. Five papers compared study quality with and without free text and found an improvement of accuracy when free text was included in analytical models. Conclusions: Findings will help stakeholders weigh the potential benefits of free text research against perceived risks to patient privacy. The taxonomy can be used to aid public and policy discussions, and identified studies could form a public-facing repository which will help the health-care text analysis research community better communicate the impact of their work.

[1]  Takashi Okumura,et al.  De-identifying Free Text of Japanese Dummy Electronic Health Records , 2018, Louhi@EMNLP.

[2]  R. Stewart,et al.  Understanding which people with dementia are at risk of inappropriate care and avoidable transitions to hospital near the end-of-life: a retrospective cohort study. , 2019, Age and ageing.

[3]  Ana Ruigómez,et al.  Validation of ischemic cerebrovascular diagnoses in the health improvement network (THIN) , 2010, Pharmacoepidemiology and drug safety.

[4]  K. Shadan,et al.  Available online: , 2012 .

[5]  John Reynders,et al.  Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation , 2019, Science Translational Medicine.

[6]  Michael Mayo,et al.  A survey of automatic de-identification of longitudinal clinical narratives , 2018, ArXiv.

[7]  R. Stewart,et al.  Long‐term antipsychotic polypharmacy prescribing in secondary mental health care and the risk of mortality , 2018, Acta psychiatrica Scandinavica.

[8]  Paul McCrone,et al.  Predicting high-cost care in a mental health setting , 2020, BJPsych Open.

[9]  R. Stewart,et al.  Predictors of Falls and Fractures Leading to Hospitalization in People With Dementia: A Representative Cohort Study. , 2018, Journal of the American Medical Directors Association.

[10]  R. Stewart,et al.  Association of cannabis use with hospital admission and antipsychotic treatment failure in first episode psychosis: an observational study , 2016, BMJ Open.

[11]  Rashmi Patel,et al.  Mood instability is a common feature of mental health disorders and is associated with poor clinical outcomes , 2015, BMJ Open.

[12]  J. MacCabe,et al.  Predictors of long-term (≥ 6 months) antipsychotic polypharmacy prescribing in secondary mental healthcare , 2016, Schizophrenia Research.

[13]  Chia-Yi Wu,et al.  Evaluation of Smoking Status Identification Using Electronic Health Records and Open-Text Information in a Large Mental Health Case Register , 2013, PloS one.

[14]  Sumithra Velupillai,et al.  Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing , 2018, Scientific Reports.

[15]  M. Owen,et al.  Reasons for discontinuing clozapine: A cohort study of patients commencing treatment , 2016, Schizophrenia Research.

[16]  R. Stewart,et al.  Late-life depression in people from ethnic minority backgrounds: Differences in presentation and management. , 2020, Journal of affective disorders.

[17]  Henrik Møller,et al.  A cohort study on mental disorders, stage of cancer at diagnosis and subsequent survival , 2014, BMJ Open.

[18]  Kerina H. Jones,et al.  The other side of the coin: harm due to the non-use of health-related data , 2016, Int. J. Medical Informatics.

[19]  Marco Spruit,et al.  DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text , 2017, Telematics Informatics.

[20]  Pia Hardelid,et al.  Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC) , 2017, International journal of epidemiology.

[21]  Andrea C. Fernandes,et al.  Demographic and clinical factors associated with different antidepressant treatments: a retrospective cohort study design in a UK psychiatric healthcare setting , 2018, BMJ Open.

[22]  S. Hernández-Díaz,et al.  Safety of non‐insulin glucose‐lowering drugs in pregnant women with pre‐gestational diabetes: A cohort study , 2018, Diabetes, obesity & metabolism.

[23]  A Rosemary Tate,et al.  Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer , 2011, BMJ Open.

[24]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[25]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA Statement , 2009, BMJ : British Medical Journal.

[26]  Graham Thornicroft,et al.  The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: development and descriptive data , 2009, BMC psychiatry.

[27]  Michael Ball,et al.  TextHunter - A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research , 2014, AMIA.

[28]  Clare L. Taylor,et al.  The characteristics and health needs of pregnant women with schizophrenia compared with bipolar disorder and affective psychoses , 2015, BMC Psychiatry.

[29]  R. Stewart,et al.  Ethnicity and excess mortality in severe mental illness: a cohort study , 2017, The lancet. Psychiatry.

[30]  Zina M. Ibrahim,et al.  SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research , 2017, bioRxiv.

[31]  Maria Liakata,et al.  Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances , 2018, J. Biomed. Informatics.

[32]  Muhammad N Anwar,et al.  Data mining of audiology patient records: factors influencing the choice of hearing aid type , 2011, DTMBIO '11.

[33]  R. Stewart,et al.  Hospitalization in people with dementia with Lewy bodies: Frequency, duration, and cost implications , 2017, Alzheimer's & dementia.

[34]  E. Ford,et al.  Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper , 2020, Journal of medical Internet research.

[35]  Jaya Chaturvedi From Learning About Machines to Machine Learning: Applications for Mental Health Rehabilitation , 2020, Journal of Psychosocial Rehabilitation and Mental Health.

[36]  R. Stewart,et al.  The relationship between polypharmacy and trajectories of cognitive decline in people with dementia: A large representative cohort study , 2019, Experimental Gerontology.

[37]  Ozlem Uzuner,et al.  Second i2b2 workshop on natural language processing challenges for clinical records. , 2008, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[38]  M. Hotopf,et al.  Negative Symptoms in Early-Onset Psychosis and Their Association With Antipsychotic Treatment Failure , 2018, Schizophrenia bulletin.

[39]  R. Stewart,et al.  Associations of acetylcholinesterase inhibitor treatment with reduced mortality in Alzheimer's disease: a retrospective survival analysis , 2018, Age and ageing.

[40]  Darren Lunn,et al.  Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum , 2019, International journal of epidemiology.

[41]  Donia Scott,et al.  Extracting information from the text of electronic medical records to improve case detection: a systematic review , 2016, J. Am. Medical Informatics Assoc..

[42]  Katherine E Henson,et al.  Data Resource Profile: National Cancer Registration Dataset in England , 2019, International journal of epidemiology.

[43]  Lynette Hirschman,et al.  Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text , 2013, J. Am. Medical Informatics Assoc..

[44]  E. Ford,et al.  Should free-text data in electronic medical records be shared for research? A citizens’ jury study in the UK , 2020, Journal of Medical Ethics.

[45]  M. Henderson,et al.  Cervical and breast cancer screening uptake among women with serious mental illness: a data linkage study , 2016, BMC Cancer.

[46]  T. Craig,et al.  Khat use among Somali mental health service users in South London , 2012, Social Psychiatry and Psychiatric Epidemiology.

[47]  K. Dean,et al.  Predictors of Mental Health Review Tribunal (MHRT) outcome in a forensic inpatient population: a prospective cohort study , 2017, BMC Psychiatry.

[48]  A. Maguire,et al.  Identifying rare diseases using electronic medical records: the example of allergic bronchopulmonary aspergillosis , 2017, Pharmacoepidemiology and drug safety.

[49]  R. Stewart,et al.  Delays before Diagnosis and Initiation of Treatment in Patients Presenting to Mental Health Services with Bipolar Disorder , 2015, PloS one.

[50]  Alistair E. W. Johnson,et al.  Deidentification of free-text medical records using pre-trained bidirectional transformers , 2020, CHIL.

[51]  Clare L. Taylor,et al.  Relapse in the first three months postpartum in women with history of serious mental illness , 2019, Schizophrenia Research.

[52]  R. Stewart,et al.  Services for people at high risk improve outcomes in patients with first episode psychosis , 2015, Acta psychiatrica Scandinavica.

[53]  Angus Roberts,et al.  Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method , 2015, BMJ Open.

[54]  A. Bourke,et al.  Generalisability of The Health Improvement Network (THIN) database: demographics, chronic disease prevalence and mortality rates. , 2011, Informatics in primary care.

[55]  K. Barraclough,et al.  Is omission of free text records a possible source of data loss and bias in Clinical Practice Research Datalink studies? A case–control study , 2016, BMJ Open.

[56]  Sumithra Velupillai,et al.  De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields , 2010, J. Biomed. Semant..

[57]  R. Stewart,et al.  Identification of the delivery of cognitive behavioural therapy for psychosis (CBTp) using a cross-sectional sample from electronic health records and open-text information in a large UK-based mental health case register , 2017, BMJ Open.

[58]  Bradley A Malin,et al.  The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight , 2019, J. Am. Medical Informatics Assoc..

[59]  R. Stewart,et al.  Associations of Neuropsychiatric Symptoms and Antidepressant Prescription with Survival in Alzheimer's Disease. , 2017, Journal of the American Medical Directors Association.

[60]  J. MacCabe,et al.  Antipsychotic polypharmacy prescribing and risk of hospital readmission , 2017, Psychopharmacology.

[61]  Cyril Grouin,et al.  Is it possible to recover personal health information from an automatically de-identified corpus of French EHRs? , 2015, Louhi@EMNLP.

[62]  I. Perez-Diez,et al.  De-identifying Spanish medical texts - Named Entity Recognition applied to radiology reports , 2020, medRxiv.

[63]  A. David,et al.  Associations of homelessness and residential mobility with length of stay after acute psychiatric admission , 2012, BMC Psychiatry.

[64]  R. Lyons,et al.  Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system , 2019, BMJ Open.

[65]  Shuying Shen,et al.  Can Physicians Recognize Their Own Patients in De-identified Notes? , 2014, MIE.

[66]  Lynn A. Karoly,et al.  Health Insurance Portability and Accountability Act of 1996 (HIPAA) Administrative Simplification , 2010, Practice Management Consultant.

[67]  Shweta,et al.  A Recurrent Neural Network Architecture for De-identifying Clinical Records , 2016, ICON.

[68]  Kostas Pantazos,et al.  De-identifying an EHR Database - Anonymity, Correctness and Readability of the Medical Record , 2011, MIE.

[69]  M. Hotopf,et al.  Mortality of people with chronic fatigue syndrome: a retrospective cohort study in England and Wales from the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Clinical Record Interactive Search (CRIS) Register , 2016, The Lancet.

[70]  Chin-Kuo Chang,et al.  Hospital admissions for respiratory system diseases in adults with intellectual disabilities in Southeast London: a register-based cohort study , 2017, BMJ Open.

[71]  K. Bhaskaran,et al.  Data Resource Profile: Clinical Practice Research Datalink (CPRD) , 2015, International journal of epidemiology.

[72]  R. Stewart,et al.  Polypharmacy in people with dementia: Associations with adverse health outcomes , 2018, Experimental Gerontology.

[73]  S. Meystre,et al.  Automatic de-identification of textual documents in the electronic health record: a review of recent research , 2010, BMC medical research methodology.

[74]  Deborah A. Nichols,et al.  Strategies for De-identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies , 2012, Medical care.

[75]  R. Stewart,et al.  The Maudsley Biomedical Research Centre (BRC) data linkage service user and carer advisory group: creating and sustaining a successful patient and public involvement group to guide research in a complex area , 2019, Research Involvement and Engagement.

[76]  Kia-Chong Chua,et al.  Predictors of care home and hospital admissions and their costs for older people with Alzheimer's disease: findings from a large London case register , 2016, BMJ Open.

[77]  J. Strang,et al.  Excess overdose mortality immediately following transfer of patients and their care as well as after cessation of opioid substitution therapy , 2018, Addiction.

[78]  R. Stewart,et al.  Recorded poor insight as a predictor of service use outcomes: cohort study of patients with first-episode psychosis in a large mental healthcare database , 2019, BMJ Open.

[79]  Tim Williams,et al.  Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death , 2019, Journal of Biomedical Semantics.

[80]  R. Stewart,et al.  ETHNIC DIFFERENCES IN COGNITION AND AGE IN PEOPLE DIAGNOSED WITH DEMENTIA: A STUDY OF ELECTRONIC HEALTH RECORDS IN TWO LARGE MENTAL HEALTH CARE PROVIDERS , 2019, Alzheimer's & Dementia.