A Temporal Abstraction-based Extract, Transform and Load Process for Creating Registry Databases for Research

In the CTSA era there is great interest in aggregating and comparing populations across institutions. These sites likely represent data differently in their clinical data warehouses and other databases. Clinical data warehouses frequently are structured in a generalized way that supports many constituencies. For research, there is a need to transform these heterogeneous data into a shared representation, and to perform categorization and interpretation to optimize the data representation for investigators. We are addressing this need by extending an existing temporal abstraction-based clinical database query system, PROTEMPA. The extended system allows specifying data types of interest in federated databases, extracting the data into a shared representation, transforming it through categorization and interpretation, and loading it into a registry database that can be refreshed. Such a registry’s access control, data representation and query tools can be tailored to the needs of research while keeping local databases as the source of truth.

[1]  Andrew R. Post,et al.  Model Formulation: PROTEMPA: A Method for Specifying and Identifying Temporal Sequences in Retrospective Data for Patient Selection , 2007, J. Am. Medical Informatics Assoc..

[2]  Christopher G. Chute,et al.  The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data , 2010, J. Am. Medical Informatics Assoc..

[3]  Evelyn P Whitlock,et al.  White Paper on CTSA Consortium Role in Facilitating Comparative Effectiveness Research , 2010, Clinical and translational science.

[4]  Joel H. Saltz,et al.  Model Formulation: caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research , 2008, J. Am. Medical Informatics Assoc..

[5]  Yuval Shahar,et al.  A Framework for Knowledge-Based Temporal Abstraction , 1997, Artif. Intell..

[6]  Li Xiong,et al.  An integrated framework for de-identifying unstructured medical data , 2009, Data Knowl. Eng..

[7]  Andrew R. Post,et al.  Abstraction-based Temporal Data Retrieval for a Clinical Data Repository , 2007, AMIA.

[8]  Yuval Shahar,et al.  An architecture for linking medical decision-support applications to clinical databases and its evaluation , 2009, J. Biomed. Informatics.

[9]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[10]  Bruce E. Bray,et al.  Architecture of a Federated Query Engine for Heterogeneous Resources , 2009, AMIA.

[11]  Samson W. Tu,et al.  A virtual medical record for guideline-based decision support , 2001, AMIA.