Indirect two-sided relative ranking: a robust similarity measure for gene expression data

BackgroundThere is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights.ResultsIn this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries.ConclusionsWe demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.

[1]  Dietmar E. Martin,et al.  Rank Difference Analysis of Microarrays (RDAM), a novel approach to statistical analysis of microarray expression profiling data , 2004, BMC Bioinformatics.

[2]  Wei-Min Liu,et al.  Analysis of high density expression microarrays with signed-rank call algorithms , 2002, Bioinform..

[3]  D. Brillinger,et al.  Handbook of methods of applied statistics , 1967 .

[4]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[5]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[6]  Christian Gieger,et al.  A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization , 2006, Nature Genetics.

[7]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[8]  R. Pieters,et al.  Prednisolone resistance in childhood acute lymphoblastic leukemia: vitro-vivo correlations and cross-resistance to other drugs. , 1998, Blood.

[9]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[10]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[11]  Gert R. G. Lanckriet,et al.  Classification of a large microarray data set: algorithm comparison and analysis of drug signatures. , 2005, Genome research.

[12]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[13]  R. Pieters,et al.  In vitro cellular drug resistance and prognosis in newly diagnosed childhood acute lymphoblastic leukemia. , 1997, Blood.

[14]  Shangqin Guo,et al.  MicroRNA-mediated control of cell fate in megakaryocyte-erythrocyte progenitors. , 2008, Developmental cell.

[15]  Martin Serrano,et al.  Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[16]  T. Golub,et al.  A Mechanism of Cyclin D1 Action Encoded in the Patterns of Gene Expression in Human Cancer , 2003, Cell.

[17]  T. Golub,et al.  Gene expression-based chemical genomics identifies rapamycin as a modulator of MCL1 and glucocorticoid resistance. , 2006, Cancer cell.

[18]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[19]  T. Hongo,et al.  In vitro drug sensitivity testing can predict induction failure and early relapse of childhood acute lymphoblastic leukemia. , 1997, Blood.

[20]  R. Pieters,et al.  Relation of cellular drug resistance to long-term clinical outcome in childhood acute lymphoblastic leukaemia , 1991, The Lancet.

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[23]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.