Combining clinical and genomics queries using i2b2 – Three methods

We are fortunate to be living in an era of twin biomedical data surges: a burgeoning representation of human phenotypes in the medical records of our healthcare systems, and high-throughput sequencing making rapid technological advances. The difficulty representing genomic data and its annotations has almost by itself led to the recognition of a biomedical “Big Data” challenge, and the complexity of healthcare data only compounds the problem to the point that coherent representation of both systems on the same platform seems insuperably difficult. We investigated the capability for complex, integrative genomic and clinical queries to be supported in the Informatics for Integrating Biology and the Bedside (i2b2) translational software package. Three different data integration approaches were developed: The first is based on Sequence Ontology, the second is based on the tranSMART engine, and the third on CouchDB. These novel methods for representing and querying complex genomic and clinical data on the i2b2 platform are available today for advancing precision medicine.

[1]  Peter Szolovits,et al.  Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. , 2011, American journal of human genetics.

[2]  I. Kohane,et al.  Electronic medical records for discovery research in rheumatoid arthritis , 2010, Arthritis care & research.

[3]  Ralph Kimball,et al.  The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .

[4]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[5]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[6]  Eric S Lander,et al.  Brave New Genome. , 2015, The New England journal of medicine.

[7]  Linghua Wang,et al.  From human genome to cancer genome: The first decade , 2013, Genome research.

[8]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[9]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[10]  Dan Xie,et al.  Variation and Genetic Control of Protein Abundance in Humans , 2013, Nature.

[11]  Marius Fieschi,et al.  Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project , 2013, J. Am. Medical Informatics Assoc..

[12]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[13]  Patrice Degoulet,et al.  Phenome-Wide Association Studies on a Quantitative Trait: Application to TPMT Enzyme Activity and Thiopurine Therapy in Pharmacogenomics , 2013, PLoS Comput. Biol..

[14]  J. Chris Anderson,et al.  CouchDB: The Definitive Guide , 2010 .

[15]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[16]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[17]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[18]  J. Svendsen,et al.  New population-based exome data are questioning the pathogenicity of previously cardiomyopathy-associated genetic variants , 2013, European Journal of Human Genetics.

[19]  Patrice Degoulet,et al.  Detection of Drug–Drug Interactions Inducing Acute Kidney Injury by Electronic Health Records Mining , 2015, Drug Safety.

[20]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[21]  Alberto Riva,et al.  BigQ: a NoSQL based framework to handle genomic variants in i2b2 , 2015, BMC Bioinformatics.

[22]  David B. Goldstein,et al.  Rare Variants Create Synthetic Genome-Wide Associations , 2010, PLoS biology.

[23]  Prakash M. Nadkarni,et al.  Data Extraction and Ad Hoc Query of an Entity– Attribute–Value Database , 2000 .

[24]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[25]  Isaac S Kohane,et al.  Federalist principles for healthcare data networks , 2015, Nature Biotechnology.

[26]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[27]  Pietro Liò,et al.  The BioMart community portal: an innovative alternative to large, centralized data repositories , 2015, Nucleic Acids Res..

[28]  Karen Eilbeck,et al.  A standard variation file format for human genome sequences , 2010, Genome Biology.

[29]  Isaac S. Kohane,et al.  A translational engine at the national scale: informatics for integrating biology and the bedside , 2012, J. Am. Medical Informatics Assoc..

[30]  E. Perakslis,et al.  Effective knowledge management in translational medicine , 2010, Journal of Translational Medicine.

[31]  Sahil R. Kalra,et al.  Big Challenges? Big Data … , 2015 .