A Grid-based solution for management and analysis of microarrays in distributed experiments

Several systems have been presented in the last years in order to manage the complexity of large microarray experiments. Although good results have been achieved, most systems tend to lack in one or more fields. A Grid based approach may provide a shared, standardized and reliable solution for storage and analysis of biological data, in order to maximize the results of experimental efforts. A Grid framework has been therefore adopted due to the necessity of remotely accessing large amounts of distributed data as well as to scale computational performances for terabyte datasets. Two different biological studies have been planned in order to highlight the benefits that can emerge from our Grid based platform. The described environment relies on storage services and computational services provided by the gLite Grid middleware. The Grid environment is also able to exploit the added value of metadata in order to let users better classify and search experiments. A state-of-art Grid portal has been implemented in order to hide the complexity of framework from end users and to make them able to easily access available services and data. The functional architecture of the portal is described. As a first test of the system performances, a gene expression analysis has been performed on a dataset of Affymetrix GeneChip® Rat Expression Array RAE230A, from the ArrayExpress database. The sequence of analysis includes three steps: (i) group opening and image set uploading, (ii) normalization, and (iii) model based gene expression (based on PM/MM difference model). Two different Linux versions (sequential and parallel) of the dChip software have been developed to implement the analysis and have been tested on a cluster. From results, it emerges that the parallelization of the analysis process and the execution of parallel jobs on distributed computational resources actually improve the performances. Moreover, the Grid environment have been tested both against the possibility of uploading and accessing distributed datasets through the Grid middleware and against its ability in managing the execution of jobs on distributed computational resources. Results from the Grid test will be discussed in a further paper.

[1]  Stephen R Quake,et al.  Significance and statistical errors in the analysis of DNA microarray data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Giancarlo Mauri,et al.  Oncology over Internet: integrating data and analysis of oncology interest on the net by means of workflows , 2005 .

[4]  Ivan Martin,et al.  Three‐Dimensional Perfusion Culture of Human Bone Marrow Cells and Generation of Osteoinductive Grafts , 2005, Stem cells.

[5]  Flavia Donno,et al.  The INFN-Grid Testbed , 2005, Future Gener. Comput. Syst..

[6]  Atul Butte,et al.  The use and analysis of microarray data , 2002, Nature Reviews Drug Discovery.

[7]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[8]  Mark A. van de Wiel,et al.  Microarray Data Analysis: From Hypotheses to Conclusions Using Gene Expression Data , 2004, Cellular oncology : the official journal of the International Society for Cellular Oncology.

[9]  Steven Tuecke,et al.  Internet X.509 Public Key Infrastructure (PKI) Proxy Certificate Profile , 2004, RFC.

[10]  S Pestka,et al.  Biological properties of recombinant alpha-interferons: 40th anniversary of the discovery of interferons. , 1998, Cancer research.

[11]  Roberto Barbera,et al.  GILDA: the grid INFN virtual laboratory for dissemination activities , 2005, First International Conference on Testbeds and Research Infrastructures for the DEvelopment of NeTworks and COMmunities.

[12]  Christian Stratowa,et al.  Distributed Storage and Analysis of Microarray Data in the Terabyte Range: An Alternative to Bioconductor , 2003 .

[13]  Alvis Brazma,et al.  On the Importance of Standardisation in Life Sciences , 2001, Bioinform..

[14]  Hubert Hackl,et al.  MARS: Microarray analysis, retrieval, and storage system , 2005, BMC Bioinformatics.

[15]  Ivan Martin,et al.  Engineering of osteoinductive grafts by isolation and expansion of ovine bone marrow stromal cells directly on 3D ceramic scaffolds , 2006, Biotechnology and bioengineering.

[16]  Fons Rademakers,et al.  ROOT — An object oriented data analysis framework , 1997 .