Metadata-driven Clinical Data Loading into i2b2 for Clinical and Translational Science Institutes

Clinical and Translational Science Award (CTSA) recipients have a need to create research data marts from their clinical data warehouses, through research data networks and the use of i2b2 and SHRINE technologies. These data marts may have different data requirements and representations, thus necessitating separate extract, transform and load (ETL) processes for populating each mart. Maintaining duplicative procedural logic for each ETL process is onerous. We have created an entirely metadata-driven ETL process that can be customized for different data marts through separate configurations, each stored in an extension of i2b2 ‘s ontology database schema. We extended our previously reported and open source Eureka! Clinical Analytics software with this capability. The same software has created i2b2 data marts for several projects, the largest being the nascent Accrual for Clinical Trials (ACT) network, for which it has loaded over 147 million facts about 1.2 million patients.

[1]  David Levine,et al.  The Analytic Information Warehouse (AIW): A platform for analytics using electronic health record data , 2013, J. Biomed. Informatics.

[2]  C. McDonald,et al.  LOINC, a universal standard for identifying laboratory observations: a 5-year update. , 2003, Clinical chemistry.

[3]  L. Appel A primer on the design, conduct, and interpretation of clinical trials. , 2006, Clinical journal of the American Society of Nephrology : CJASN.

[4]  Nhan Do,et al.  Implementation of RxNorm as a Terminology Mediation Standard for Exchanging Pharmacy Medication between Federal Agencies , 2006, AMIA.

[5]  Clement J. McDonald,et al.  Standardizing clinical laboratory data for secondary use , 2012, J. Biomed. Informatics.

[6]  Francis S. Collins,et al.  PCORnet: turning a dream into reality , 2014, J. Am. Medical Informatics Assoc..

[7]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[8]  Christopher G Chute,et al.  Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[9]  Douglas MacFadden,et al.  SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies , 2013, PloS one.

[10]  Mohammed Saeed,et al.  Open-access MIMIC-II database for intensive care research , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  Joel H. Saltz,et al.  Semantic ETL into i2b2 with Eureka! , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[12]  S. Hohmann,et al.  Validation of the University HealthSystem Consortium administrative dataset: concordance and discordance with patient-level institutional data. , 2014, The Journal of surgical research.

[13]  L. Schwartz,et al.  Promise and pitfalls of quantitative imaging in oncology clinical trials. , 2012, Magnetic resonance imaging.

[14]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..