Finding Important Terms for Patients in Their Electronic Health Records: A Learning-to-Rank Approach Using Expert Annotations

Background Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients’ notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care. Objective We aimed to develop a supervised natural language processing (NLP) system called Finding impOrtant medical Concepts most Useful to patientS (FOCUS) that automatically identifies and ranks medical terms in EHR notes based on their importance to the patients. Methods First, we built an expert-annotated corpus. For each EHR note, 2 physicians independently identified medical terms important to the patient. Using the physicians’ agreement as the gold standard, we developed and evaluated FOCUS. FOCUS first identifies candidate terms from each EHR note using MetaMap and then ranks the terms using a support vector machine-based learn-to-rank algorithm. We explored rich learning features, including distributed word representation, Unified Medical Language System semantic type, topic features, and features derived from consumer health vocabulary. We compared FOCUS with 2 strong baseline NLP systems. Results Physicians annotated 90 EHR notes and identified a mean of 9 (SD 5) important terms per note. The Cohen’s kappa annotation agreement was .51. The 10-fold cross-validation results show that FOCUS achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.940 for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FOCUS for identifying important terms from EHR notes was 0.866 AUC-ROC. Both performance scores significantly exceeded the corresponding baseline system scores (P<.001). Rich learning features contributed to FOCUS’s performance substantially. Conclusions FOCUS can automatically rank terms from EHR notes based on their importance to patients. It may help develop future interventions that improve quality of care.

[1]  Zhang Xiong,et al.  Embedding assisted prediction architecture for event trigger identification , 2015, J. Bioinform. Comput. Biol..

[2]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[3]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[4]  Allen C. Browne,et al.  Identifying Consumer-Friendly Display (CFD) Names for Health Concepts , 2005, AMIA.

[5]  Xin Jiang,et al.  A ranking approach to keyphrase extraction , 2009, SIGIR.

[6]  Erin Sarzynski,et al.  Opportunities to improve clinical summaries for patients at hospital discharge , 2016, BMJ Quality & Safety.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Qing Zeng-Treitler,et al.  A Text Corpora-Based Estimation of the Familiarity of Health Terminology , 2005, ISBMDA.

[9]  Adam E M Eltorai,et al.  Readability of Patient Education Materials on the American Orthopaedic Society for Sports Medicine Website , 2014, The Physician and sportsmedicine.

[10]  Hong Yu,et al.  Methods for Linking EHR Notes to Education Materials , 2015, Information Retrieval Journal.

[11]  Robert Nguyen,et al.  The Literacy Divide: Health Literacy and the Use of an Internet-Based Patient Portal in an Integrated Health System—Results from the Diabetes Study of Northern California (DISTANCE) , 2010, Journal of health communication.

[12]  Cynthia Brandt,et al.  Improving Patients' Electronic Health Record Comprehension with NoteAid , 2013, MedInfo.

[13]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[14]  Xiaolong Wang,et al.  Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks , 2014, BioMed research international.

[15]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[16]  Hong Yu,et al.  Mining and Ranking Biomedical Synonym Candidates from Wikipedia , 2015, Louhi@EMNLP.

[17]  Charles Abraham,et al.  Lay understanding of terms used in cancer consultations , 2003, Psycho-oncology.

[18]  R A Greenes,et al.  Characteristics of Consumer Terminology for Health Information Retrieval , 2002, Methods of Information in Medicine.

[19]  C. Pyper,et al.  Patients' experiences when accessing their on-line electronic patient records in primary care. , 2004, The British journal of general practice : the journal of the Royal College of General Practitioners.

[20]  Urmimala Sarkar,et al.  Barriers and Facilitators to Online Portal Use Among Patients and Caregivers in a Safety Net Health Care System: A Qualitative Study , 2015, Journal of medical Internet research.

[21]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[22]  Harpreet K. Monga,et al.  Evaluation of Controlled Vocabulary Resources for Development of a Consumer Entry Vocabulary for Diabetes , 2001, Journal of medical Internet research.

[23]  Maria Kvist,et al.  Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language , 2014, PITR@EACL.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Suzanne Morony,et al.  Readability of Written Materials for CKD Patients: A Systematic Review. , 2015, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[26]  Alla Keselman,et al.  Making Texts in Electronic Health Records Comprehensible to Consumers: A Prototype Translator , 2007, AMIA.

[27]  Joann G Elmore,et al.  Inviting Patients to Read Their Doctors' Notes: A Quasi-experimental Study and a Look Ahead , 2011, Annals of Internal Medicine.

[28]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[29]  Heng Ji,et al.  Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion , 2015, BioNLP@IJCNLP.

[30]  Gondy Leroy,et al.  Research Paper: Consumer Health Concepts That Do Not Map to the UMLS: Where Do They Fit? , 2008, J. Am. Medical Informatics Assoc..

[31]  Mark A. Kutner,et al.  The Health Literacy of American Adults:Results from the 2003 National Assessment of Adult Literacy. , 2006 .

[32]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[33]  Laurent Romary,et al.  HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID , 2010, *SEMEVAL.

[34]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[35]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  L. G. Doak,et al.  Teaching Patients With Low Literacy Skills , 1985 .

[37]  Dagobert Soergel,et al.  Exploring Medical Expressions Used by Consumers and the Media: An Emerging View of Consumer Health Vocabularies , 2003, AMIA.

[38]  Kamal Sarkar Automatic Keyphrase Extraction from Medical Documents , 2009, PReMI.

[39]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[40]  Maria Kvist,et al.  Identifying adverse drug event information in clinical notes with distributional semantic representations of context , 2015, J. Biomed. Informatics.

[41]  L. G. Doak,et al.  Improving comprehension for cancer patients with low literacy skills: Strategies for clinicians , 1998, CA: a cancer journal for clinicians.

[42]  Feifan Liu,et al.  Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts , 2009, NAACL.

[43]  Alla Keselman,et al.  Text Characteristics of Clinical Reports and Their Implications for the Readability of Personal Health Records , 2007, MedInfo.

[44]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[45]  Taya Irizarry,et al.  Patient Portals and Patient Engagement: A State of the Science Review , 2015, Journal of medical Internet research.

[46]  T. Volsko,et al.  Readability assessment of internet-based consumer health information. , 2008, Respiratory care.

[47]  S. McGhee,et al.  Patient on-line access to medical records in general practice. , 1992, Health bulletin.

[48]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[49]  Q. Zeng,et al.  Exploring and Developing Consumer Health Vocabularies , 2005 .

[50]  Yaakov HaCohen-Kerner,et al.  Automatic Extraction and Learning of Keyphrases from Scientific Articles , 2005, CICLing.

[51]  Mita Nasipuri,et al.  A New Approach to Keyphrase Extraction Using Neural Networks , 2010, ArXiv.

[52]  Alla Keselman,et al.  Towards Consumer-Friendly PHRs: Patients' Experience with Reviewing Their Health Records , 2007, AMIA.

[53]  Zhenchao Jiang,et al.  A general protein-protein interaction extraction architecture based on word representation and feature selection , 2016, Int. J. Data Min. Bioinform..

[54]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[55]  Enrico Blanzieri,et al.  Improving Machine Learning Approaches for Keyphrases Extraction from Scientific Documents with Natural Language Knowledge , 2010 .

[56]  Alexa T. McCray,et al.  Terminology issues in user access to Web-based medical information , 1999, AMIA.

[57]  Sara J Czaja,et al.  Consumers' Perceptions of Patient-Accessible Electronic Medical Records , 2013, Journal of medical Internet research.

[58]  Kamal Sarkar A Hybrid Approach to Extract Keyphrases from Medical Documents , 2013, ArXiv.

[59]  Qing Zeng-Treitler,et al.  A semantic and syntactic text simplification tool for health content. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[60]  Gail Graham,et al.  Evaluating Patient Access to Electronic Health Records: Results From a Survey of Veterans , 2013, Medical care.

[61]  Ian H. Witten,et al.  Domain-independent automatic keyphrase indexing with small training sets , 2008, J. Assoc. Inf. Sci. Technol..

[62]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[63]  Yaoyun Zhang,et al.  Clinical Abbreviation Disambiguation Using Neural Word Embeddings , 2015, BioNLP@IJCNLP.

[64]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[65]  Hong Yu,et al.  Bidirectional RNN for Medical Event Detection in Electronic Health Records , 2016, NAACL.

[66]  Dean F. Sittig,et al.  The Medicare Electronic Health Record Incentive Program: provider performance on core and menu measures. , 2014, Health services research.

[67]  Yi-fang Brook Wu,et al.  Identifying important concepts from medical documents , 2006, J. Biomed. Informatics.

[68]  Joann G. Elmore,et al.  Open Notes: Doctors and Patients Signing On , 2010, Annals of Internal Medicine.

[69]  Stephanie Seneff,et al.  Using word embedding for bio-event extraction , 2015, BioNLP@IJCNLP.

[70]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[71]  Robert A. Greenes,et al.  Patient and Clinician Vocabulary: How Different Are They? , 2001, MedInfo.

[72]  Xiaolong Wang,et al.  Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries , 2015, Inf..

[73]  Iryna Gurevych,et al.  Approximate Matching for Evaluating Keyphrase Extraction , 2009, RANLP.

[74]  R. Steinbrook Health care and the American Recovery and Reinvestment Act. , 2009, The New England journal of medicine.

[75]  Susan S Woods,et al.  Patient Experiences With Full Electronic Access to Health Records and Clinical Notes Through the My HealtheVet Personal Health Record Pilot: Qualitative Study , 2013, Journal of medical Internet research.

[76]  Jürgen Branke,et al.  Survey: State of the Art , 2002 .

[77]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[78]  E. Lerner,et al.  Medical communication: do our patients understand? , 2000, The American journal of emergency medicine.