Leveraging Electronic Health Records for Phenotyping

With the adoption of electronic health records (EHRs) and subsequent increase in the availability of clinical data in electronic form has come the opportunity to use that data for research. While using EHR data for research has been suggested for decades, specific changes in research capabilities in the last few years have exploded their potential. Genome-wide association studies use many more genotype-phenotype pairs to identify associations. The cost of genotyping data has dropped to the point that the cost of creating phenotypes can now be the limiting factor in a study. EHR data can be extracted inexpensively from existing records, overcoming that limit. Other research activities, such as comparative effectiveness research and creating a learning healthcare system, also benefit from analysis using EHR data. Through initiatives like eMERGE and i2b2, researchers have advanced our capabilities and understanding of using EHRs for phenotyping. Challenges remain, and overcoming these challenges will be critical in maximizing the promise of secondary use for EHR data.

[1]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[2]  Cui Tao,et al.  Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project , 2012, J. Biomed. Informatics.

[3]  R S Evans,et al.  Improving empiric antibiotic selection using computer decision support. , 1994, Archives of internal medicine.

[4]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[5]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[6]  M. Skolnick,et al.  THe genetics of familial breast cancer. , 1996, Seminars in oncology.

[7]  J. Steiner,et al.  A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. , 2012, Medical care.

[8]  Deborah H. Batson,et al.  Data model considerations for clinical effectiveness researchers. , 2012, Medical care.

[9]  I. Kohane Using electronic health records to drive discovery in disease genomics , 2011, Nature Reviews Genetics.

[10]  D. Blumenthal,et al.  The "meaningful use" regulation for electronic health records. , 2010, The New England journal of medicine.

[11]  Indra Neil Sarkar Methods in biomedical informatics : a pragmatic approach , 2014 .

[12]  Chunhua Weng,et al.  Comparing the effectiveness of a clinical registry and a clinical data warehouse for supporting clinical trial recruitment: a case study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[13]  Chunhua Weng,et al.  A real-time screening alert improves patient recruitment efficiency. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[14]  Min Li,et al.  High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge , 2010, J. Am. Medical Informatics Assoc..

[15]  Michael J. Becich,et al.  Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics , 2012, Journal of pathology informatics.

[16]  P. Elliott,et al.  Size matters: just how big is BIG? , 2008, International journal of epidemiology.

[17]  Dean F Sittig,et al.  A Survey of Informatics Platforms That Enable Distributed Comparative Effectiveness Research Using Multi-institutional Heterogenous Clinical Data , 2012, Medical care.

[18]  E. DeLong,et al.  Discordance of Databases Designed for Claims Payment versus Clinical Information Systems: Implications for Outcomes Research , 1993, Annals of Internal Medicine.

[19]  C J McDonald,et al.  Practice databases and their uses in clinical research. , 1991, Statistics in medicine.

[20]  Charles Safran,et al.  Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[21]  C Safran,et al.  Outcomes research using the electronic patient record: Beth Israel Hospital's experience with anticoagulation. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[22]  Melissa A. Basford,et al.  Ethical and practical challenges of sharing data from genome-wide association studies: the eMERGE Consortium experience. , 2011, Genome research.

[23]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[24]  M. Rieder,et al.  A genome-wide scan for common genetic variants with a large influence on warfarin maintenance dose. , 2008, Blood.

[25]  Jason H. Moore,et al.  Chapter 11: Genome-Wide Association Studies , 2012, PLoS Comput. Biol..

[26]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[27]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[28]  R. Saunders,et al.  Best Care at Lower Cost: The Path to Continuously Learning Health Care in America , 2013 .

[29]  M. Kahn,et al.  Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks , 2013, Medical care.

[30]  George Hripcsak,et al.  Medical text representations for inductive learning , 2000, AMIA.

[31]  George Hripcsak,et al.  Defining and measuring completeness of electronic health records for secondary use , 2013, J. Biomed. Informatics.

[32]  C Safran,et al.  Using routinely collected data for clinical research. , 1991, Statistics in medicine.

[33]  L. Tsui,et al.  Erratum: Identification of the Cystic Fibrosis Gene: Genetic Analysis , 1989, Science.

[34]  Suzette J. Bielinski,et al.  Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study , 2012, J. Am. Medical Informatics Assoc..

[35]  J. Slutsky,et al.  Building sustainable multi-functional prospective electronic clinical data systems. , 2012, Medical care.

[36]  Adam B Wilcox,et al.  Research Data Collection Methods: From Paper to Tablet Computers , 2012, Medical care.

[37]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[38]  Shelley A. Rusincovitch,et al.  A comparison of phenotype definitions for diabetes mellitus. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[39]  Jay R. Desai,et al.  Construction of a Multisite DataLink Using Electronic Health Records for the Identification, Surveillance, Prevention, and Management of Diabetes Mellitus: The SUPREME-DM Project , 2012, Preventing chronic disease.

[40]  George Hripcsak,et al.  Review Paper: Detecting Adverse Events Using Information Technology , 2003, J. Am. Medical Informatics Assoc..

[41]  Isaac S. Kohane,et al.  A translational engine at the national scale: informatics for integrating biology and the bedside , 2012, J. Am. Medical Informatics Assoc..

[42]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[43]  Joshua C. Denny,et al.  Chapter 13: Mining Electronic Health Records in the Genomics Era , 2012, PLoS Comput. Biol..

[44]  Andrew D. Johnson,et al.  Bmc Medical Genetics an Open Access Database of Genome-wide Association Results , 2009 .