Flexible data integration and ontology-based data access to medical records

The ASSIST project aims to facilitate cervical cancer research by integrating medical records containing both phenotypic and genotypic data, and residing in different medical centres or hospitals. The goal of ASSIST is to enable the evaluation of medical hypotheses and the conduct of association studies in an intuitive manner, thereby allowing medical researchers to identify risk factors that can then be used at the point of care to identify women who are at high risk of developing cervical cancer. This paper presents the current status of the ASSIST medical knowledgebase. In particular, we discuss the challenges faced in constructing the ASSIST integrated resource and in enabling query processing through a domain ontology, and the solutions provided using the AutoMed heterogeneous data integration system. We focus on data cleansing issues, on data integration issues related to integrating relational medical data sources into an independent domain ontology and also on query processing. Of particular interest is the challenge of providing an easily maintainable integrated resource in a setting where the data sources and the domain ontology are developed independently and are therefore both highly likely to evolve over time.

[1]  Isaac S. Kohane,et al.  Bioinformatics and Clinical Informatics: The Imperative to Collaborate , 2000, J. Am. Medical Informatics Assoc..

[2]  Walter V. Sujansky,et al.  Heterogeneous Database Integration in Biomedicine , 2001, J. Biomed. Informatics.

[3]  Dan Suciu,et al.  Comprehension syntax , 1994, SGMD.

[4]  Alexandra Poulovassilis,et al.  Data integration by bi-directional schema transformation rules , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[5]  Georg Lausen,et al.  Relational Databases in RDF: Keys and Foreign Keys , 2008, SWDB-ODBIS.

[6]  Alexandra Poulovassilis,et al.  Ontology-Assisted Data Transformation and Integration , 2008, ODBIS.

[7]  Alon Y. Halevy,et al.  Data integration and genomic medicine , 2007, J. Biomed. Informatics.

[8]  Sean Martin,et al.  Globally distributed object identification for biological knowledgebases , 2004, Briefings Bioinform..

[9]  Boris Motik,et al.  Query Answering for OWL-DL with Rules , 2004, International Semantic Web Conference.

[10]  Martine Collard,et al.  Semantic Web, Ontologies and Databases, VLDB Workshop, SWDB-ODBIS 2007, Vienna, Austria, September 24, 2007, Revised Selected Papers , 2008, SWDB-ODBIS.

[11]  Alexandra Poulovassilis,et al.  A Uniform Approach to Inter-model Transformations , 1999, CAiSE.

[12]  Alexandra Poulovassilis,et al.  Combining information extraction and data integration in the estest system , 2006, ICSOFT.

[13]  Alexandra Poulovassilis,et al.  Schema Evolution in Data Warehousing Environments - A Schema Transformation-Based Approach , 2004, ER.

[14]  Franz Baader,et al.  Pushing the EL Envelope , 2005, IJCAI.

[15]  Alexandra Poulovassilis,et al.  Data Access and Integration in the ISPIDER Proteomics Grid , 2006, DILS.

[16]  Peishen Qi,et al.  Integrating Databases into the Semantic Web through an Ontology-Based Framework , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[17]  JRobert Beck,et al.  The Cancer Biomedical Informatics Grid (caBIG): infrastructure and applications for a worldwide research community. , 2007, Studies in health technology and informatics.

[18]  Norman W. Paton,et al.  TAMBIS: transparent access to multiple bioinformatics services , 2005 .

[19]  Philip A. Bernstein,et al.  Implementing mapping composition , 2007, The VLDB Journal.

[20]  Diego Calvanese,et al.  Linking Data to Ontologies , 2008, J. Data Semant..

[21]  Alexandra Poulovassilis,et al.  P2P Query Reformulation over Both-As-View Data Transformation Rules , 2006, DBISP2P.

[22]  Alexandra Poulovassilis,et al.  Defining Peer-to-Peer Data Integration Using Both as View Rules , 2003, DBISP2P.

[23]  Peter Mork,et al.  The Multiple Roles of Ontologies in the BioMediator Data Integration System , 2005, DILS.

[24]  Alexandra Poulovassilis,et al.  Cluster Based Integration of Heterogeneous Biological Databases Using the AutoMed Toolkit , 2005, DILS.

[25]  Tim E A Peto,et al.  MRSA bacteraemia in patients on arrival in hospital: a cohort study in Oxfordshire 1997-2003 , 2005, BMJ : British Medical Journal.