EMDomics: a robust and powerful method for the identification of genes differentially expressed between heterogeneous classes

Motivation: A major goal of biomedical research is to identify molecular features associated with a biological or clinical class of interest. Differential expression analysis has long been used for this purpose; however, conventional methods perform poorly when applied to data with high within class heterogeneity. Results: To address this challenge, we developed EMDomics, a new method that uses the Earth mover’s distance to measure the overall difference between the distributions of a gene’s expression in two classes of samples and uses permutations to obtain q-values for each gene. We applied EMDomics to the challenging problem of identifying genes associated with drug resistance in ovarian cancer. We also used simulated data to evaluate the performance of EMDomics, in terms of sensitivity and specificity for identifying differentially expressed gene in classes with high within class heterogeneity. In both the simulated and real biological data, EMDomics outperformed competing approaches for the identification of differentially expressed genes, and EMDomics was significantly more powerful than conventional methods for the identification of drug resistance-associated gene sets. EMDomics represents a new approach for the identification of genes differentially expressed between heterogeneous classes and has utility in a wide range of complex biomedical conditions in which sample classes show within class heterogeneity. Availability and implementation: The R package is available at http://www.bioconductor.org/packages/release/bioc/html/EMDomics.html Contact: abeck2@bidmc.harvard.edu Supplementary information: supplementary data are available at Bioinformatics online.

[1]  P. Polakis Wnt signaling and cancer. , 2000, Genes & development.

[2]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[3]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[4]  A. Shelling,et al.  Role of p53 in drug resistance in ovarian cancer , 1997, The Lancet.

[5]  David V Conti,et al.  Genetic variation in insulin-like growth factor 2 may play a role in ovarian cancer risk. , 2011, Human molecular genetics.

[6]  Chakrabarti,et al.  Augmented expression of endothelin‐1, endothelin‐3 and the endothelin‐B receptor in breast carcinoma , 2000, Histopathology.

[7]  Hye-Sung Moon,et al.  Immunohistochemical and quantitative competitive PCR analyses of midkine and pleiotrophin expression in cervical cancer. , 2003, Gynecologic oncology.

[8]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[9]  Piotr Zawierucha,et al.  Extracellular Matrix Proteins Expression Profiling in Chemoresistant Variants of the A2780 Ovarian Cancer Cell Line , 2014, BioMed research international.

[10]  Michael J. Becich,et al.  Tests for finding complex patterns of differential expression in cancers: towards individualized medicine , 2004, BMC Bioinformatics.

[11]  Fan Yang,et al.  MAGEC2, an epithelial-mesenchymal transition inducer, is associated with breast cancer metastasis , 2014, Breast Cancer Research and Treatment.

[12]  Malka Gorfine,et al.  Comment on “ Detecting Novel Associations in Large Data Sets ” , 2012 .

[13]  J. Willey,et al.  Molecular Cancer Stable Low-level Expression of P21 Waf1/cip1 in A549 Human Bronchogenic Carcinoma Cell Line-derived Clones Down-regulates E2f1 Mrna and Restores Cell Proliferation Control , 2006 .

[14]  A. Cheung,et al.  Increased Expression of PITX2 Transcription Factor Contributes to Ovarian Cancer Progression , 2012, PloS one.

[15]  Debashis Ghosh,et al.  COPA - cancer outlier profile analysis , 2006, Bioinform..

[16]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  A. Jemal,et al.  Cancer statistics, 2013 , 2013, CA: a cancer journal for clinicians.

[18]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[19]  Daniel Herr,et al.  Local Renin-Angiotensin System in the Reproductive System , 2013, Front. Endocrinol..

[20]  Y-H Wu,et al.  COL11A1 promotes tumor progression and predicts poor clinical outcome in ovarian cancer , 2014, Oncogene.

[21]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[22]  Sib Sankar Roy,et al.  Wnt/β-Catenin Pathway Is Regulated by PITX2 Homeodomain Protein and Thus Contributes to the Proliferation of Human Ovarian Adenocarcinoma Cell, SKOV-3* , 2012, The Journal of Biological Chemistry.

[23]  Tatsuo Kanda,et al.  Fatty Acid Binding Protein 6 Is Overexpressed in Colorectal Cancer , 2006, Clinical Cancer Research.

[24]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[25]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[26]  C A SALVATORE,et al.  [Carcinoma of the ovary]. , 1960, Anais brasileiros de ginecologia.

[27]  S. Cannistra,et al.  Gene-expression profiling in epithelial ovarian cancer , 2008, Nature Clinical Practice Oncology.

[28]  N. McGranahan,et al.  The causes and consequences of genetic heterogeneity in cancer evolution , 2013, Nature.

[29]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[30]  P Maisonneuve,et al.  Carcinoma of the ovary. FIGO 26th Annual Report on the Results of Treatment in Gynecological Cancer. , 2006, International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics.

[31]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[32]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[33]  Donald J Buchsbaum,et al.  The Wnt/β-catenin pathway in ovarian cancer: a review. , 2013, Gynecologic oncology.

[34]  R. Tibshirani,et al.  Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011 , 2014, 1401.7645.

[35]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[36]  Tian-Li Wang,et al.  Identification of molecular pathway aberrations in uterine serous carcinoma by genome-wide analyses. , 2012, Journal of the National Cancer Institute.

[37]  T. Conrads,et al.  Identification of candidate circulating cisplatin-resistant biomarkers from epithelial ovarian carcinoma cell secretomes , 2013, British Journal of Cancer.

[38]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[39]  Yihong Zhu,et al.  Ovarian epithelial cancer: a role for PGE2-synthesis and signalling in malignant transformation and progression , 2006, Molecular Cancer.

[40]  Tiffany M Hebert,et al.  Insulin-like Growth Factor 2 Expression Modulates Taxol Resistance and Is a Candidate Biomarker for Reduced Disease-Free Survival in Ovarian Cancer , 2010, Clinical Cancer Research.

[41]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[42]  Gloria S. Huang,et al.  Insulin-Like Growth Factor 2 Silencing Restores Taxol Sensitivity in Drug Resistant Ovarian Cancer , 2014, PloS one.

[43]  John D. Storey A direct approach to false discovery rates , 2002 .