DW4TR: A Data Warehouse for Translational Research

The linkage between the clinical and laboratory research domains is a key issue in translational research. Integration of clinicopathologic data alone is a major task given the number of data elements involved. For a translational research environment, it is critical to make these data usable at the point-of-need. Individual systems have been developed to meet the needs of particular projects though the need for a generalizable system has been recognized. Increased use of Electronic Medical Record data in translational research will demand generalizing the system for integrating clinical data to support the study of a broad range of human diseases. To ultimately satisfy these needs, we have developed a system to support multiple translational research projects. This system, the Data Warehouse for Translational Research (DW4TR), is based on a light-weight, patient-centric modularly-structured clinical data model and a specimen-centric molecular data model. The temporal relationships of the data are also part of the model. The data are accessed through an interface composed of an Aggregated Biomedical-Information Browser (ABB) and an Individual Subject Information Viewer (ISIV) which target general users. The system was developed to support a breast cancer translational research program and has been extended to support a gynecological disease program. Further extensions of the DW4TR are underway. We believe that the DW4TR will play an important role in translational research across multiple disease types.

[1]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[2]  Richard J. Mural,et al.  Abstract P3-01-04: Differential Gene Expression Analysis among Post-Menopausal Caucasian Invasive Breast Cancer, Benign and Normal Subjects , 2010 .

[3]  C. Street,et al.  The Cancer Biomedical Informatics Grid (caBIGTM) , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[4]  Yang Huang,et al.  Using a Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon to Assign SNOMED CT Codes to Anatomic Sites and Pathologic Diagnoses in Full Text Pathology Reports , 2009, AMIA.

[5]  Alexander C. Yu,et al.  Methods in biomedical ontology , 2006, J. Biomed. Informatics.

[6]  Rachel L Richesson,et al.  Viewpoint: Data Standards in Clinical Research: Gaps, Overlaps, Challenges and Future Directions , 2007, J. Am. Medical Informatics Assoc..

[7]  I. Sarkar Biomedical informatics and translational medicine , 2010, Journal of Translational Medicine.

[8]  W. H. Inmon,et al.  Rdb/VMS: Developing the Data Warehouse , 1993 .

[9]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[10]  Shawn N Murphy,et al.  Integrating outside modules into the i2b2 architecture. , 2008, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  C. Compton,et al.  AJCC Cancer Staging Manual , 2002, Springer New York.

[13]  Scott Gustafson,et al.  caCORE: A common infrastructure for cancer informatics , 2003, Bioinform..

[14]  Gregory Y. Lauwers,et al.  Protocol for the Examination of Specimens From Patients With Invasive Carcinoma of the Breast , 2009 .

[15]  Donald L Weaver,et al.  Protocol for the Examination of Specimens from Patients with Ductal Carcinoma In Situ ( DCIS ) of the Breast Protocol applies to DCIS without invasive carcinoma or microinvasion , 2010 .

[16]  Isaac S. Kohane,et al.  Architecture of the Open-source Clinical Research Chart from Informatics for Integrating Biology and the Bedside , 2007, AMIA.

[17]  Donald L Weaver,et al.  Protocol for the examination of specimens from patients with invasive carcinoma of the breast. , 2009, Archives of pathology & laboratory medicine.

[18]  J J Cimino,et al.  The Practical Impact of Ontologies on Biomedical Informatics , 2006, Yearbook of Medical Informatics.

[19]  J. Bard,et al.  Ontologies in biology: design, applications and future challenges , 2004, Nature Reviews Genetics.

[20]  S W Tu,et al.  Knowledge-based temporal abstraction for diabetic monitoring. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[21]  Andrew R. Post,et al.  Abstraction-based Temporal Data Retrieval for a Clinical Data Repository , 2007, AMIA.

[22]  W Ceusters,et al.  From a time standard for medical informatics to a controlled language for health. , 1998, International journal of medical informatics.

[23]  A Burgun,et al.  Accessing and Integrating Data and Knowledge for Biomedical Research , 2008, Yearbook of Medical Informatics.

[24]  Robert J. Taylor,et al.  Implementation Brief: Description of a Rule-based System for the i2b2 Challenge in Natural Language Processing for Clinical Data , 2009, J. Am. Medical Informatics Assoc..

[25]  Christopher G. Chute,et al.  Synergies and Distinctions Between Computational Disciplines in Biomedical Research: Perspective From the Clinical and Translational Science Award Programs , 2009, Academic medicine : journal of the Association of American Medical Colleges.

[26]  Robin E J Munro,et al.  Addressing informatics challenges in Translational Research with workflow technology. , 2008, Drug discovery today.

[27]  Hai Hu,et al.  A Bayesian derived network of breast pathology co-occurrence , 2008, J. Biomed. Informatics.

[28]  Werner Ceusters,et al.  Reconciling users' needs and formal requirements: issues in developing a reusable ontology for medicine , 1998, IEEE Transactions on Information Technology in Biomedicine.

[29]  Anthony Rhodes,et al.  American Society of Clinical Oncology/College Of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  Stéphane M. Meystre,et al.  A Clinical Use Case to Evaluate the i2b2 Hive: Predicting Asthma Exacerbations , 2009, AMIA.

[31]  Prakash M. Nadkarni,et al.  Data Extraction and Ad Hoc Query of an Entity– Attribute–Value Database , 2000 .

[32]  C. Printz,et al.  New AJCC Cancer Staging Manual reflects changes in cancer knowledge , 2010, Cancer.

[33]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[34]  Yuval Shahar,et al.  A Framework for Knowledge-Based Temporal Abstraction , 1997, Artif. Intell..

[35]  Nich Wattanasin,et al.  Integration of Hive and cell software in the i2b2 architecture. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[36]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[37]  Michael N. Liebman,et al.  Biomedical Informatics in Translational Research , 2008 .

[38]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[39]  Anita Burgun-Parenthoine,et al.  Integrating clinical, gene expression, protein expression and preanalytical data for in silico cancer research , 2008, MIE.

[40]  Joyce A. Mitchell,et al.  Evaluating the informatics for integrating biology and the bedside system for clinical research , 2009 .

[41]  Susan C. Weber,et al.  Automated Mapping of Pharmacy Orders from Two Electronic Health Record Systems to RxNorm within the STRIDE Clinical Data Warehouse , 2009, AMIA.

[42]  Yuval Shahar,et al.  A temporal database mediator for protocol-based decision support , 1997, AMIA.

[43]  C A Brandt,et al.  Approaches and Informatics Tools to Assist in the Integration of Similar Clinical Research Questionnaires , 2004, Methods of Information in Medicine.

[44]  Philip R. O. Payne,et al.  Translational informatics: enabling high-throughput research paradigms. , 2009, Physiological genomics.

[45]  JRobert Beck,et al.  The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. , 2007, Studies in health technology and informatics.

[46]  P M Nadkarni,et al.  CECIL: a database for storing and retrieving clinical and molecular information on patients with Alport syndrome. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[47]  Alexander A Kon,et al.  The Clinical and Translational Science Award (CTSA) Consortium and the Translational Research Model , 2008, The American journal of bioethics : AJOB.

[48]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[49]  Eric P. Hoffman,et al.  The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface , 2004, Nucleic Acids Res..

[50]  Michael Krauthammer,et al.  Semantic web data warehousing for caGrid , 2009, BMC Bioinformatics.

[51]  Christopher G Chute,et al.  National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. , 2006, Omics : a journal of integrative biology.

[52]  M. J. Straube,et al.  A chartless record—Is it adequate? , 1982, Journal of Medical Systems.

[53]  Richard J. Mural,et al.  Abstract P3-13-02: Ethnicity Difference of Benign Breast Diseases in Breast Cancer and Non-Cancer Patients , 2010 .

[54]  Hua Min,et al.  Integration of prostate cancer clinical data using an ontology , 2009, J. Biomed. Informatics.

[55]  K. Buetow,et al.  Cancer Informatics Vision: caBIG™ , 2006, Cancer informatics.

[56]  Hui Cheng,et al.  Biomedical informatics: development of a comprehensive data warehouse for clinical and genomic breast cancer research. , 2004, Pharmacogenomics.

[57]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[58]  Gilberto Fragoso,et al.  caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability , 2008, J. Biomed. Informatics.

[59]  Isaac S. Kohane,et al.  Integration of Clinical and Genetic Data in the i2b2 Architecture , 2006, AMIA.

[60]  Neil Barrett,et al.  Applying natural language processing toolkits to electronic health records - an experience report. , 2009, Studies in health technology and informatics.

[61]  Andrew R. Post,et al.  Model Formulation: PROTEMPA: A Method for Specifying and Identifying Temporal Sequences in Retrospective Data for Patient Selection , 2007, J. Am. Medical Informatics Assoc..

[62]  S W Tu,et al.  A temporal-abstraction mediator for protocol-based decision-support systems. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[63]  Y. Shoham Reasoning About Change: Time and Causation from the Standpoint of Artificial Intelligence , 1987 .

[64]  Richard J. Mural,et al.  Association of Clinicopathologic Characteristics with IHC-Based Breast Cancer Subtypes. , 2009 .

[65]  Odd O Aalen,et al.  Breast cancer tumor growth estimated through mammography screening data , 2008, Breast Cancer Research.

[66]  Kiyonari Inamura,et al.  Development of a time-oriented data warehouse based on a medical information event model. , 2002, Igaku butsuri : Nihon Igaku Butsuri Gakkai kikanshi = Japanese journal of medical physics : an official journal of Japan Society of Medical Physics.

[67]  Daniel L. Rubin,et al.  Biomedical ontologies: a functional perspective , 2007, Briefings Bioinform..

[68]  Rajiv Dhir,et al.  The development and deployment of Common Data Elements for tissue banks for translational research in cancer – An emerging standard based approach for the Mesothelioma Virtual Tissue Bank , 2008, BMC Cancer.

[69]  Jeffrey P. Krischer,et al.  Research Paper: Variation of SNOMED CT Coding of Clinical Research Concepts among Coding Experts , 2007, J. Am. Medical Informatics Assoc..

[70]  Rolf Apweiler,et al.  The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries , 2006, BMC Bioinformatics.

[71]  Cynthia Brandt,et al.  Temporal query of attribute-value patient data: utilizing the constraints of clinical studies , 2003, Int. J. Medical Informatics.

[72]  Rainer Röhrig,et al.  An Integrated Data-Warehouse-Concept for Clinical and Biological Information , 2005, MIE.

[73]  Anthony Rhodes,et al.  American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer. , 2007, Archives of pathology & laboratory medicine.