A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities.

Analysis of multivariate data sets from, for example, microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper, we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward comparison between any 2 lists. It can also be used to generate new more stable gene rankings incorporating more information from the experimental data. Using 2 microarray data sets, we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance of the rankings.

[1]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[2]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[4]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[6]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[7]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[8]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[9]  Michael A. Black,et al.  Microarray-based gene set analysis: a comparison of current methods , 2008, BMC Bioinformatics.

[10]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[11]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[12]  Justin Zobel,et al.  Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context , 2010, BMC Bioinformatics.

[13]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[14]  D. Aldous Exchangeability and related topics , 1985 .

[15]  J. Kingman Uses of Exchangeability , 1978 .

[16]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[17]  Max L. Warshauer,et al.  Lecture Notes in Mathematics , 2001 .

[18]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[19]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[20]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[22]  Rainer Spang,et al.  Similarities of Ordered Gene Lists , 2006, J. Bioinform. Comput. Biol..

[23]  S. Falcon,et al.  Combining Results of Microarray Experiments: A Rank Aggregation Approach , 2006, Statistical applications in genetics and molecular biology.

[24]  R.K. Pearson Reciprocal Rank-Based Comparison of Ordered Gene Lists , 2007, 2007 IEEE International Workshop on Genomic Signal Processing and Statistics.

[25]  Zengyou He,et al.  Stable Feature Selection for Biomarker Discovery , 2010, Comput. Biol. Chem..

[26]  Ruth Etzioni,et al.  Combining Results of Microarray Experiments: A Rank Aggregation Approach , 2006 .

[27]  Anne-Laure Boulesteix,et al.  Stability and aggregation of ranked gene lists , 2009, Briefings Bioinform..

[28]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[29]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[30]  Cesare Furlanello,et al.  Algebraic stability indicators for ranked lists in molecular profiling , 2008, Bioinform..

[31]  R. Gelber,et al.  Prediction of cancer outcome with microarrays , 2005, The Lancet.

[32]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[34]  Philip M. Long,et al.  Comment on " 'Stemness': Transcriptional Profiling of Embryonic and Adult Stem Cells" and "A Stem Cell Molecular Signature" (I) , 2003, Science.

[35]  Yudi Pawitan,et al.  Detecting differential expression in microarray data: comparison of optimal procedures , 2007, BMC Bioinformatics.

[36]  Korbinian Strimmer,et al.  A general modular framework for gene set enrichment analysis , 2009, BMC Bioinformatics.

[37]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[38]  Hui Xiao,et al.  Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes , 2009, Bioinform..

[39]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[40]  Jing Zhu,et al.  Extracting consistent knowledge from highly inconsistent cancer gene data sources , 2010, BMC Bioinformatics.

[41]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[42]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[43]  Cesare Furlanello,et al.  Algebraic Comparison of Partial Lists in Bioinformatics , 2010, PloS one.

[44]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Peter Kokol,et al.  Stability of Ranked Gene Lists in Large Microarray Analysis Studies , 2010, Journal of biomedicine & biotechnology.

[46]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[47]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[48]  Rainer Breitling,et al.  A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments , 2008, Bioinform..

[49]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[50]  Guido Jenster,et al.  Venn Mapping: clustering of heterologous microarray data based on the number of co-occurring differentially expressed genes , 2003, Bioinform..