MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data

BackgroundMandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons.ResultsWe developed a workflow to facilitate and standardize Meta-Analysis of Affymetrix Microarray Data analysis (MAAMD) in Kepler. Two freely available stand-alone software tools, R and AltAnalyze were embedded in MAAMD. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. MAAMD was tested by analyzing 4 different GEO datasets from mice and drosophila.MAAMD automates data downloading, data organization, data quality control assesment, differential gene expression analysis, clustering analysis, pathway visualization, gene-set enrichment analysis, and cross-species orthologous-gene comparisons. MAAMD was utilized to identify gene orthologues responding to hypoxia or hyperoxia in both mice and drosophila. The entire set of analyses for 4 datasets (34 total microarrays) finished in ~ one hour.ConclusionsMAAMD saves time, minimizes the required computer skills, and offers a standardized procedure for users to analyze microarray datasets and make new intra- and inter-dataset comparisons.

[1]  Jano I. van Hemert,et al.  Scientific Workflow: A Survey and Research Directions , 2007, PPAM.

[2]  Norman Pavelka,et al.  AMDA: an R package for the automated microarray data analysis , 2006, BMC Bioinformatics.

[3]  Ryan T Demmer,et al.  Bioinformatics techniques in microarray research: applied microarray data analysis using R and SAS software. , 2010, Methods in molecular biology.

[4]  Zhiping Weng,et al.  Gene set enrichment analysis: performance evaluation and usage guidelines , 2012, Briefings Bioinform..

[5]  Christopher Leckie,et al.  Meta-analysis of gene expression microarrays with missing replicates , 2011, BMC Bioinformatics.

[6]  Bertram Ludäscher,et al.  Workflows for microarray data processing in the Kepler environment , 2012, BMC Bioinformatics.

[7]  Chris T. A. Evelo,et al.  Bioinformatics Applications Note Databases and Ontologies Go-elite: a Flexible Solution for Pathway and Ontology Over-representation , 2022 .

[8]  Christopher Leckie,et al.  Meta-analysis of gene expression microarrays with missing replicates , 2011, BMC Bioinformatics.

[9]  Gabriel G. Haddad,et al.  Distinct Mechanisms Underlying Tolerance to Intermittent and Constant Hypoxia in Drosophila melanogaster , 2009, PloS one.

[10]  Hannele Ruohola-Baker,et al.  Chronic Hypoxia Impairs Muscle Function in the Drosophila Model of Duchenne's Muscular Dystrophy (DMD) , 2010, PloS one.

[11]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  E. Benarroch,et al.  Hypoxia-induced mediators and neurologic disease , 2009, Neurology.

[13]  J. Lancaster,et al.  Microarray-based gene expression studies in ovarian cancer. , 2011, Cancer control : journal of the Moffitt Cancer Center.

[14]  Mayte Suárez-Fariñas,et al.  Comparing microarray studies. , 2007, Methods in molecular biology.

[15]  BMC Bioinformatics , 2005 .

[16]  Douglas G Altman,et al.  Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets , 2008, PLoS medicine.

[17]  Tero Aittokallio,et al.  Dealing with missing values in large-scale studies: microarray data imputation and beyond , 2010, Briefings Bioinform..

[18]  Gabriel G. Haddad,et al.  Experimental Selection for Drosophila Survival in Extremely High O2 Environments , 2007, PloS one.

[19]  M. Singer,et al.  The medical use of oxygen: a time for critical reappraisal , 2013, Journal of internal medicine.

[20]  Yang Li,et al.  Gene expression profiles of four heat shock proteins in response to different acute stresses in shrimp, Litopenaeus vannamei. , 2012, Comparative biochemistry and physiology. Toxicology & pharmacology : CBP.

[21]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[22]  Alexander García Castro,et al.  Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator , 2004, BMC Bioinformatics.

[23]  Mark A. Musen,et al.  Chapter 9: Analyses Using Disease Ontologies , 2012, PLoS Comput. Biol..

[24]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[25]  Dorothea Emig,et al.  AltAnalyze and DomainGraph: analyzing and visualizing exon expression data , 2010, Nucleic Acids Res..