Ontology Based Integration of Distributed and Heterogeneous Data Sources in ACGT

In this work, we describe the set of tools comprising the Data Access Infrastructure within Advancing Clinic-genomic Trials on Cancer (ACGT), a R&D Project funded in part by the European. This infrastructure aims at improving Post-genomic clinical trials by providing seamless access to integrated clinical, genetic, and image databases. A data access layer, based on OGSA-DAI, has been developed in order to cope with syntactic heterogeneities in databases. The semantic problems present in data sources with different nature are tackled by two core tools, namely the Semantic Mediator and the Master Ontology on Cancer. The ontology is used as a common framework for semantics, modeling the domain and acting as giving support to homogenization. SPARQL has been selected as query language for the Data Access Services and the Mediator. Two experiments have been carried out in order to test the suitability of the selected approach, integrating clinical and DICOM image databases.