Exploiting sample variability to enhance multivariate analysis of microarray data

MOTIVATION Biological and technical variability is intrinsic in any microarray experiment. While most approaches aim to account for this variability, they do not actively exploit it. Here, we consider a novel approach that uses the variability between arrays to provide an extra source of information that can enhance gene expression analyses. RESULTS We develop a method that uses sample similarity to incorporate sample variability into the analysis of gene expression profiles. This allows each pairwise correlation calculation to borrow information from all the data in the experiment. Results on synthetic and human cancer microarray datasets show that the inclusion of this information leads to a significant increase in the ability to identify previously characterized relationships and a reduction in false discovery rate, when compared to a standard analysis using Pearson correlation. The information carried by the variability between arrays can be exploited to significantly improve the analysis of gene expression data. AVAILABILITY Matlab script files are available from the author. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Wei-Min Liu,et al.  Analysis of high density expression microarrays with signed-rank call algorithms , 2002, Bioinform..

[2]  I. Jolliffe Principal Component Analysis , 2002 .

[3]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Stat Pairs,et al.  Statistical Algorithms Description Document , 2022 .

[5]  Hujun Yin,et al.  Modeling and analysis of gene expression time-series based on co-expression , 2005, Int. J. Neural Syst..

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[8]  A. Yakovlev,et al.  How high is the level of technical noise in microarray data? , 2007, Biology Direct.

[9]  Michal J. Okoniewski,et al.  Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations , 2006, BMC Bioinformatics.

[10]  Jussi Tohka,et al.  Automated diagnosis of brain tumours astrocytomas using probabilistic neural network clustering and support vector machines , 2005, Int. J. Neural Syst..

[11]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[12]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[13]  Eric P. Hoffman,et al.  Sources of variability and effect of experimental approach on expression profiling data interpretation , 2002, BMC Bioinformatics.

[14]  Roger E Bumgarner,et al.  Clustering gene-expression data with repeated measurements , 2003, Genome Biology.

[15]  Tomas Lindahl,et al.  Human DNA repair genes, 2005. , 2005, Mutation research.

[16]  Joaquín Dopazo,et al.  BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments , 2005, Nucleic Acids Res..

[17]  Mark A. van de Wiel,et al.  Microarray Data Analysis: From Hypotheses to Conclusions Using Gene Expression Data , 2004, Cellular oncology : the official journal of the International Society for Cellular Oncology.

[18]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[19]  Hujun Yin,et al.  Nonlinear Multidimensional Data Projection and Visualisation , 2003, IDEAL.

[20]  Richard D. Wood,et al.  Human DNA Repair Genes , 2001, Science.

[21]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[22]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[23]  Crispin J. Miller,et al.  Relation of a hypoxia metagene derived from head and neck cancer to prognosis of multiple cancers. , 2007, Cancer research.

[24]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[25]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[26]  R. Strausberg,et al.  Transcriptional response to hypoxia in human tumors. , 2001, Journal of the National Cancer Institute.

[27]  Adrian L. Harris,et al.  Hypoxia — a key regulatory factor in tumour growth , 2002, Nature Reviews Cancer.

[28]  Ben Shneiderman,et al.  Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays , 2004, Bioinform..