A Semantic Grid Infrastructure Enabling Integrated Access and Analysis of Multilevel Biomedical Data in Support of Postgenomic Clinical Trials on Cancer

This paper reports on original results of the Advancing Clinico-Genomic Trials on Cancer integrated project focusing on the design and development of a European biomedical grid infrastructure in support of multicentric, postgenomic clinical trials (CTs) on cancer. Postgenomic CTs use multilevel clinical and genomic data and advanced computational analysis and visualization tools to test hypothesis in trying to identify the molecular reasons for a disease and the stratification of patients in terms of treatment. This paper provides a presentation of the needs of users involved in postgenomic CTs, and presents such needs in the form of scenarios, which drive the requirements engineering phase of the project. Subsequently, the initial architecture specified by the project is presented, and its services are classified and discussed. A key set of such services are those used for wrapping heterogeneous clinical trial management systems and other public biological databases. Also, the main technological challenge, i.e. the design and development of semantically rich grid services is discussed. In achieving such an objective, extensive use of ontologies and metadata are required. The Master Ontology on Cancer, developed by the project, is presented, and our approach to develop the required metadata registries, which provide semantically rich information about available data and computational services, is provided. Finally, a short discussion of the work lying ahead is included.

[1]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[2]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Mary Shaw,et al.  Software architecture - perspectives on an emerging discipline , 1996 .

[4]  Konstantina S. Nikita,et al.  In silico radiation oncology: combining novel simulation algorithms with current visualization techniques , 2002, Proc. IEEE.

[5]  Mario Cannataro,et al.  KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery , 2002 .

[6]  David Cameron,et al.  Identification of molecular apocrine breast tumours by microarray analysis , 2005, Oncogene.

[7]  Ian T. Foster,et al.  Describing the Elephant: The Different Faces of IT as Service , 2005, ACM Queue.

[8]  Ning Zhong,et al.  Intelligent Technologies for Information Analysis , 2004, Springer Berlin Heidelberg.

[9]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[10]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[11]  John Quackenbush,et al.  The quest for the mechanisms of life , 2003, Biotechnology and bioengineering.

[12]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[13]  Said Mirza Pahlevi,et al.  OGSA-WebDB: an OGSA-based system for bringing Web databases into the grid , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[14]  Matthew MacDonald,et al.  Web Services Architecture , 2004 .

[15]  J. O’Shaughnessy,et al.  Molecular signatures predict outcomes of breast cancer. , 2006, The New England journal of medicine.

[16]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[17]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[19]  Mario Cannataro,et al.  Proteus, a Grid based Problem Solving Environment for Bioinformatics: Architecture and Experiments , 2004 .

[20]  Barry Smith,et al.  A Strategy for Improving and Integrating Biomedical Ontologies , 2005, AMIA.

[21]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[22]  I. Foster,et al.  Service-Oriented Science , 2005, Science.

[23]  Nicholas R. Jennings,et al.  The Semantic Grid: A Future e‐Science Infrastructure , 2003 .

[24]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[26]  Ian Foster,et al.  The Globus toolkit , 1998 .

[27]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[28]  Chris F. Taylor,et al.  A systematic approach to modeling, capturing, and disseminating proteomics experimental data , 2003, Nature Biotechnology.

[29]  Carole A. Goble,et al.  A Suite of Daml+Oil Ontologies to Describe Bioinformatics Web Services and Data , 2003, Int. J. Cooperative Inf. Syst..

[30]  J. Foekens,et al.  Multicenter validation of a gene expression-based prognostic signature in lymph node-negative primary breast cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[31]  K. Buetow Cyberinfrastructure: Empowering a "Third Way" in Biomedical Research , 2005, Science.

[32]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[33]  Matthias Lange,et al.  SEMEDA: ontology based semantic integration of biological databases , 2003, Bioinform..

[34]  Barry Smith,et al.  An Ontology for Carcinoma Classification for Clinical Bioinformatics , 2005, MIE.

[35]  Barry Smith,et al.  Biodynamic ontology: applying BFO in the biomedical domain. , 2004, Studies in health technology and informatics.

[36]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[37]  Christina Backes,et al.  Integrative analysis of cancer‐related data using CAP , 2004, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[38]  Kent Lai,et al.  Alternative pathways of galactose assimilation: could inverse metabolic engineering provide an alternative to galactosemic patients? , 2004, Metabolic engineering.

[39]  Mario Cannataro,et al.  Integrating ontology and workflow in PROTEUS, a grid-based problem solving environment for bioinformatics , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..