GRISSOM Platform: Enabling Distributed Processing and Management of Biological Data Through Fusion of Grid and Web Technologies

Transcriptomic technologies have a critical impact in the revolutionary changes that reshape biological research. Through the recruitment of novel high-throughput instrumentation and advanced computational methodologies, an unprecedented wealth of quantitative data is produced. Microarray experiments are considered high-throughput, both in terms of data volumes (data intensive) and processing complexity (computationally intensive). In this paper, we present grids for in silico systems biology and medicine (GRISSOM), a web-based application that exploits GRID infrastructures for distributed data processing and management, of DNA microarrays (cDNA, Affymetrix, Illumina) through a generic, consistent, computational analysis framework. GRISSOM performs versatile annotation and integrative analysis tasks, through the use of third-party application programming interfaces, delivered as web services. In parallel, by conforming to service-oriented architectures, it can be encapsulated in other biomedical processing workflows, with the help of workflow enacting software, like Taverna Workbench, thus rendering access to its algorithms, transparent and generic. GRISSOM aims to set a generic paradigm of efficient metamining that promotes translational research in biomedicine, through the fusion of grid and semantic web computing technologies.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Carl R. Pelz,et al.  Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data , 2008, BMC Bioinformatics.

[3]  John Soldatos,et al.  HECTOR: Enabling Microarray Experiments over the Hellenic Grid Infrastructure , 2009, Journal of Grid Computing.

[4]  Eric. Newcomer,et al.  Understanding SOA with Web Services , 2004 .

[5]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[6]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[7]  Junjun Zhang,et al.  BioMart Central Portal—unified access to biological data , 2009, Nucleic Acids Res..

[8]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[9]  Ron Edgar,et al.  NCBI GEO standards and services for microarray data , 2006, Nature Biotechnology.

[10]  Ilias Maglogiannis,et al.  GRISSOM web based grid portal: Exploiting the power of grid infrastructure for the interpretation and storage of DNA microarray experiments , 2009, 2009 9th International Conference on Information Technology and Applications in Biomedicine.

[11]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[12]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[13]  Giovanni Parmigiani,et al.  When should one subtract background fluorescence in 2-color microarrays? , 2006, Biostatistics.

[14]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[15]  Aristotelis A. Chatziioannou,et al.  Gene ARMADA: an integrated multi-analysis platform for microarray data implemented in MATLAB , 2009, BMC Bioinformatics.

[16]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[17]  Aristotelis A. Chatziioannou,et al.  Exploiting Statistical Methodologies and Controlled Vocabularies for Prioritized Functional Analysis of Genomic Experiments: the StRAnGER Web Application , 2011, Front. Neurosci..

[18]  Hubert Rehrauer,et al.  MAGMA: analysis of two-channel microarrays made easy , 2007, Nucleic Acids Res..

[19]  James Lyons-Weiler,et al.  caGEDA: a web application for the integrated analysis of global gene expression patterns in cancer , 2004, Applied bioinformatics.

[20]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[21]  H. Stunnenberg,et al.  ChIP‐Seq of ERα and RNA polymerase II defines genes differentially responding to ligands , 2009, The EMBO journal.

[22]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[23]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[24]  Kiyoko F. Aoki-Kinoshita,et al.  Gene annotation and pathway mapping in KEGG. , 2007, Methods in molecular biology.

[25]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[26]  A. D. Meglio,et al.  Programming the Grid with gLite , 2006 .

[27]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Joaquín Dopazo,et al.  GEPAS, a web-based tool for microarray data analysis and interpretation , 2008, Nucleic Acids Res..

[29]  Ilias Maglogiannis,et al.  KEGGconverter: a tool for the in-silico modelling of metabolic networks of the KEGG Pathways database , 2009, BMC Bioinformatics.

[30]  Jonathan Pevsner,et al.  SNOMAD (Standardization and NOrmalization of MicroArray Data): web-accessible gene expression data analysis , 2002, Bioinform..

[31]  José María Carazo,et al.  Engene: the processing and exploratory analysis of gene expression data , 2003, Bioinform..

[32]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[34]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[35]  Vassilis Aidinis,et al.  Comparative expression profiling in pulmonary fibrosis suggests a role of hypoxia-inducible factor-1alpha in disease pathogenesis. , 2007, American journal of respiratory and critical care medicine.

[36]  Hubert Hackl,et al.  MARS: Microarray analysis, retrieval, and storage system , 2005, BMC Bioinformatics.

[37]  Marlon E. Pierce,et al.  BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment , 2008, 2008 IEEE Fourth International Conference on eScience.

[38]  Miron Livny,et al.  Condor: a distributed job scheduler , 2001 .

[39]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[40]  Gary L. Argraves,et al.  GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms , 2010, BMC Bioinformatics.

[41]  Andrea Schenone,et al.  A Grid-based solution for management and analysis of microarrays in distributed experiments , 2007, BMC Bioinformatics.

[42]  Jeffrey T. Leek,et al.  Cloud-scale RNA-sequencing differential , 2010 .

[43]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..