Outcomes of gene association analysis of cancer microarray data are impacted by pre-processing algorithms

Gene association analysis of cancer microarray data provides a wealth of information on gene expression patterns and cancer pathways to enhance the identification of potential biomarkers for cancer diagnosis, prognosis, and prediction of therapeutic responsiveness. However, achieving these biological/clinical objectives relies heavily on the functional capabilities and accuracy of the various analytical tools to mine these cancer microarray gene expression profiles. Many preprocessing algorithms exist for analyzing Affymetrix microarray gene expression data. Previous studies have evaluated these algorithms on their capabilities in accurately determining gene expression using a variety of spike-in as well as experimental data sets. However, variations in detecting differentially expressed genes between these different pre-processing algorithms on a single cancer dataset have not been done in a systems-level evaluation. In this study, we assessed the comparability and the level of variation between PLIER, GCRMA, RMA and MAS5 for their capability to detect differentially expressed genes.

[1]  Y. Benjamini,et al.  More powerful procedures for multiple significance testing. , 1990, Statistics in medicine.

[2]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  K. Hui,et al.  Identification and Validation of a Novel Gene Signature Associated with the Recurrence of Human Hepatocellular Carcinoma , 2007, Clinical Cancer Research.

[4]  Rafael A. Irizarry,et al.  Stochastic models inspired by hybridization theory for short oligonucleotide arrays , 2004, J. Comput. Biol..

[5]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[6]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[7]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[8]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[9]  C. Schlötterer,et al.  Comparison of algorithms for the analysis of Affymetrix microarray data as evaluated by co-expression of genes in known operons , 2006, Nucleic acids research.

[10]  Stephen L George,et al.  Statistical Issues in Translational Cancer Research , 2008, Clinical Cancer Research.

[11]  Richard Simon,et al.  What should physicians look for in evaluating prognostic gene-expression signatures? , 2010, Nature Reviews Clinical Oncology.

[12]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[13]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[15]  Z. Szallasi,et al.  Correction of technical bias in clinical microarray data improves concordance with known biological information , 2008, Genome Biology.

[16]  Z. Szallasi,et al.  Evaluation of Microarray Preprocessing Algorithms Based on Concordance with RT-PCR in Clinical Samples , 2009, PloS one.