Asterias: A Parallelized Web-based Suite for the Analysis of Expression and aCGH Data

The analysis of expression and CGH arrays plays a central role in the study of complex diseases, especially cancer, including finding markers for early diagnosis and prognosis, choosing an optimal therapy, or increasing our understanding of cancer development and metastasis. Asterias (http://www.asterias.info) is an integrated collection of freely-accessible web tools for the analysis of gene expression and aCGH data. Most of the tools use parallel computing (via MPI) and run on a server with 60 CPUs for computation; compared to a desktop or server-based but not parallelized application, parallelization provides speed ups of factors up to 50. Most of our applications allow the user to obtain additional information for user-selected genes (chromosomal location, PubMed ids, Gene Ontology terms, etc.) by using clickable links in tables and/or figures. Our tools include: normalization of expression and aCGH data (DNMAD); converting between different types of gene/clone and protein identifiers (IDconverter/IDClight); filtering and imputation (preP); finding differentially expressed genes related to patient class and survival data (Pomelo II); searching for models of class prediction (Tnasas); using random forests to search for minimal models for class prediction or for large subsets of genes with predictive capacity (GeneSrF); searching for molecular signatures and predictive genes with survival data (SignS); detecting regions of genomic DNA gain or loss (ADaCGH). The capability to send results between different applications, access to additional functional information, and parallelized computation make our suite unique and exploit features only available to web-based applications.

[1]  L. Staudt,et al.  Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. , 2004, The New England journal of medicine.

[2]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[3]  Ingrid K. Glad,et al.  CGH-Explorer: a program for analysis of array-CGH data , 2005, Bioinform..

[4]  D. Ransohoff Bias as a threat to the validity of cancer molecular-marker research , 2005, Nature reviews. Cancer.

[5]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[6]  Beate Sick,et al.  RACE: Remote Analysis Computation for gene Expression data , 2005, Nucleic Acids Res..

[7]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[8]  Simon Kasif,et al.  GEMS: a web server for biclustering analysis of expression data , 2005, Nucleic Acids Res..

[9]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[10]  Ramón Díaz-Uriarte,et al.  Supervised Methods with Genomic Data: a Review and Cautionary View , 2005, Data Analysis and Visualization in Genomics and Proteomics.

[11]  Joaquín Dopazo,et al.  Next station in microarray data analysis: GEPAS , 2006, Nucleic Acids Res..

[12]  Stanley N Cohen,et al.  Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Christopher G. Chute,et al.  Cancer Informatics , 2002, Health Informatics.

[14]  Chiara Romualdi,et al.  MIDAW: a web tool for statistical analysis of microarray data , 2005, Nucleic Acids Res..

[15]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[16]  Zlatko Trajanoski,et al.  CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis , 2006, Nucleic Acids Res..

[17]  Emmanuel Barillot,et al.  CAPweb: a bioinformatics CGH array Analysis Platform , 2006, Nucleic Acids Res..

[18]  Jacquelyn S. Fetrow,et al.  Scientific Software Development Is Not an Oxymoron , 2006, PLoS Comput. Biol..

[19]  Ian Foster,et al.  Designing and building parallel programs , 1994 .

[20]  Joaquín Dopazo,et al.  GEPAS: a web-based resource for microarray gene expression data analysis , 2003, Nucleic Acids Res..

[21]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[22]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[23]  Roger E Bumgarner,et al.  Multiclass classification of microarray data with repeated measurements: application to cancer , 2003, Genome Biology.

[24]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[25]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[26]  Åsa Hedman,et al.  SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data , 2005, Nucleic acids research.

[27]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[28]  Joaquín Dopazo,et al.  New Challenges in Gene Expression Data Analysis and the Extended GEPAS , 2004, Spanish Bioinformatics Conference.

[29]  A. Chinnaiyan,et al.  Integrative analysis of the cancer transcriptome , 2005, Nature Genetics.

[30]  Trey Ideker,et al.  VAMPIRE microarray suite: a web-based platform for the interpretation of gene expression data , 2005, Nucleic Acids Res..

[31]  Joaquín Dopazo,et al.  GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data , 2005, Nucleic Acids Res..