IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics

Motivation: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability. Results: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments. Availability and implementation: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R. Contact: Marta.Rosikiewicz@unil.ch Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  T. Hudson,et al.  Characterization of variability in large-scale gene expression data: implications for study design. , 2002, Genomics.

[2]  Audrey Kauffmann,et al.  Bioinformatics Applications Note Arrayqualitymetrics—a Bioconductor Package for Quality Assessment of Microarray Data , 2022 .

[3]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[4]  Ibrahim Emam,et al.  Gene Expression Atlas at the European Bioinformatics Institute , 2009, Nucleic Acids Res..

[5]  T P Speed,et al.  Experimental design and low-level analysis of microarray data. , 2004, International review of neurobiology.

[6]  H. Klein,et al.  Leukemia Gene Atlas – A Public Platform for Integrative Exploration of Genome-Wide Molecular Data , 2012, PloS one.

[7]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[8]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[9]  Terence P. Speed,et al.  Quality Assessment for Short Oligonucleotide Microarray Data , 2007, Technometrics.

[10]  Crispin J. Miller,et al.  Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis , 2005, Bioinform..

[11]  Kengo Kinoshita,et al.  COXPRESdb: a database of comparative gene coexpression networks of eleven species for mammals , 2012, Nucleic Acids Res..

[12]  Chi Zhang,et al.  TiSGeD: a database for tissue-specific genes , 2010, Bioinform..

[13]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[14]  Terence P. Speed,et al.  Quality Assessment of Affymetrix GeneChip Data , 2005 .

[15]  Audrey Kauffmann,et al.  Contributions of the EMERALD project to assessing and improving microarray data quality. , 2011, BioTechniques.

[16]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[17]  Wei Chen,et al.  Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data , 2005, BMC Bioinformatics.

[18]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[19]  Sven Bergmann,et al.  Comparative modular analysis of gene expression in vertebrate organs , 2012, BMC Genomics.

[20]  T. Barrette,et al.  Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. , 2007, Neoplasia.

[21]  Peter N. Murakami,et al.  Assessing affymetrix GeneChip microarray quality , 2011, BMC Bioinformatics.

[22]  Stéphane Le Crom,et al.  yMGV: helping biologists with yeast microarray data mining , 2002, Nucleic Acids Res..

[23]  Sébastien Moretti,et al.  Bgee: Integrating and Comparing Heterogeneous Transcriptome Data Among Species , 2008, DILS.

[24]  Rafael A Irizarry,et al.  Frozen robust multiarray analysis (fRMA). , 2010, Biostatistics.

[25]  Johann A. Gagnon-Bartsch,et al.  Using control genes to correct for unwanted variation in microarray data. , 2012, Biostatistics.

[26]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[27]  Julie A. Dickerson,et al.  PLEXdb: gene expression resources for plants and plant pathogens , 2011, Nucleic Acids Res..

[28]  Timothy Wilkes,et al.  Microarray data quality - review of current developments. , 2007, Omics : a journal of integrative biology.

[29]  S. Bergmann,et al.  The evolution of gene expression levels in mammalian organs , 2011, Nature.

[30]  Chunlei Wu,et al.  BioGPS and MyGene.info: organizing online, gene-centric information , 2012, Nucleic Acids Res..

[31]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[32]  Beate Sick,et al.  Quality assessment of Affymetrix GeneChip data. , 2006, Omics : a journal of integrative biology.

[33]  Peter Widmayer,et al.  Genevestigator V3: A Reference Expression Database for the Meta-Analysis of Transcriptomes , 2008, Adv. Bioinformatics.

[34]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[35]  W. Huber,et al.  Microarray data quality control improves the detection of differentially expressed genes. , 2010, Genomics.

[36]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[37]  H. Parkinson,et al.  Large scale comparison of global gene expression patterns in human and mouse , 2010, Genome Biology.

[38]  Yidong Chen,et al.  GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus , 2008, Bioinform..

[39]  Adam L. Asare,et al.  Power enhancement via multivariate outlier testing with gene expression arrays , 2009, Bioinform..