Comprehensive detection of cancer gene expression profiles and gene networks are impacted by the choice of pre-processing algorithm and gene-selection method

Pre-processing algorithms (PPA) and gene-selection methods (GSM) are commonly employed to select Differentially Expressed Genes (DEGs) from microarray data. Previous studies established that different combinations of PPAs and GSMs are intrinsically different in their performance to select biologically relevant DEGs. In this study, we evaluated eight combinations of PPAs and GSMs for their ability to select DEGs for prioritising gene-networks. Although the different combinations yielded dissimilar DEG-lists, all DEG-lists selected could segregate tumour from normal. Nevertheless, the DEG-list selected significantly impacted the prioritisation of cancer-associated gene-networks; hence the initial choice of PPA and GSM is crucial for subsequent interactome investigations.

[1]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[2]  K. Hui,et al.  Identification and Validation of a Novel Gene Signature Associated with the Recurrence of Human Hepatocellular Carcinoma , 2007, Clinical Cancer Research.

[3]  Rafael A. Irizarry,et al.  Stochastic models inspired by hybridization theory for short oligonucleotide arrays , 2004, J. Comput. Biol..

[4]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[5]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[6]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[7]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[8]  C. Schlötterer,et al.  Comparison of algorithms for the analysis of Affymetrix microarray data as evaluated by co-expression of genes in known operons , 2006, Nucleic acids research.

[9]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Z. Szallasi,et al.  Correction of technical bias in clinical microarray data improves concordance with known biological information , 2008, Genome Biology.

[11]  Z. Szallasi,et al.  Evaluation of Microarray Preprocessing Algorithms Based on Concordance with RT-PCR in Clinical Samples , 2009, PloS one.

[12]  Alfonso Valencia,et al.  Translational disease interpretation with molecular networks , 2009, Genome Biology.

[13]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[14]  Stephen L George,et al.  Statistical Issues in Translational Cancer Research , 2008, Clinical Cancer Research.

[15]  Richard Simon,et al.  What should physicians look for in evaluating prognostic gene-expression signatures? , 2010, Nature Reviews Clinical Oncology.

[16]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[17]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[18]  K. Hui,et al.  Identification of unique and common low abundance tumour-specific transcripts by suppression subtractive hybridization and oligonucleotide probe array analysis , 2007, Oncogene.

[19]  Marcel J. T. Reinders,et al.  A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability , 2009, BMC Bioinformatics.

[20]  H. Aburatani,et al.  Identification of genes preferentially methylated in hepatitis C virus‐related hepatocellular carcinoma , 2010, Cancer science.

[21]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[23]  Y. Benjamini,et al.  More powerful procedures for multiple significance testing. , 1990, Statistics in medicine.

[24]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.