Evaluation of Analytical Methods for Connectivity Map Data

Connectivity map data and associated methodologies have become a valuable tool in understanding drug mechanism of action (MOA) and discovering new indications for drugs. However, few systematic evaluations have been done to assess the accuracy of these methodologies. One of the difficulties has been the lack of benchmarking data sets. Iskar et al. (PLoS. Comput. Biol. 6, 2010) predicted the Anatomical Therapeutic Chemical (ATC) drug classification based on drug-induced gene expression profile similarity (DIPS), and quantified the accuracy of their method by computing the area under the curve (AUC) of the Receiver Operating Characteristic (ROC) curve. We adopt the same data and extend the methodology, by using a simpler eXtreme cosine (XCos) method, and find it does better in this limited setting than the Kolmogorov-Smirnov (KS) statistic. In fact, for partial AUC (a more relevant statistic for actual application to repositioning) XCos does 17% better than the DIPS method (p=1.2e-7). We also observe that smaller gene signatures (with 100 probes) do better than larger ones (with 500 probes), and that DMSO controls from within the same batch obviate the need for mean centering. As expected there is heterogeneity in the prediction accuracy amongst the various ATC codes. We find that good transcriptional response to drug treatment appears necessary but not sufficient to achieve high AUCs. Certain ATC codes, such as those corresponding to corticosteroids, had much higher AUCs possibly due to strong transcriptional responses and consistency in MOA.

[1]  Ilya Shmulevich,et al.  ProbCD: enrichment analysis accounting for categorization uncertainty , 2007, BMC Bioinformatics.

[2]  Adam C. Gower,et al.  Discovering biological connections between experimental conditions based on common patterns of differential gene expression , 2011, BMC Bioinformatics.

[3]  R. Shields,et al.  mRNA Expression Signatures of Human Skeletal Muscle Atrophy Identify a Natural Compound that Increases Muscle Mass , 2011, Cell metabolism.

[4]  A. Butte,et al.  Expression-based Pathway Signature Analysis (EPSA): Mining publicly available microarray data for insight into human disease , 2008, BMC Medical Genomics.

[5]  Mario Medvedovic,et al.  Generalized random set framework for functional enrichment analysis using primary genomics datasets , 2011, Bioinform..

[6]  Jae Yong Cho,et al.  Gene Expression Signature Analysis Identifies Vorinostat as a Candidate Therapy for Gastric Cancer , 2011, PloS one.

[7]  Joel Dudley,et al.  Exploiting drug-disease relationships for computational drug repositioning , 2011, Briefings Bioinform..

[8]  Pankaj Agarwal,et al.  Gene Vector Analysis (Geneva): A unified method to detect differentially-regulated gene sets and similar microarray experiments , 2008, BMC Bioinformatics.

[9]  Peer Bork,et al.  Drug-Induced Regulation of Target Expression , 2010, PLoS Comput. Biol..

[10]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[11]  Michael J. Barratt,et al.  Evaluation of phenoxybenzamine in the CFA model of pain following gene expression studies and connectivity mapping , 2010, Molecular pain.

[12]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[13]  J. Kishimoto,et al.  Identification of novel hair‐growth inducers by means of connectivity mapping , 2010, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[14]  Deepak K Rajpal,et al.  Applications of Connectivity Map in drug discovery and development. , 2012, Drug discovery today.

[15]  Jesse M. Engreitz,et al.  ProfileChaser: searching microarray repositories based on genome-wide patterns of differential expression , 2011, Bioinform..

[16]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[17]  R. Tagliaferri,et al.  Discovery of drug mode of action and drug repositioning from transcriptional responses , 2010, Proceedings of the National Academy of Sciences.

[18]  I. Guyon,et al.  Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.

[19]  Alexander A. Morgan,et al.  Computational Repositioning of the Anticonvulsant Topiramate for Inflammatory Bowel Disease , 2011, Science Translational Medicine.

[20]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[21]  Yajie Wang,et al.  Using Functional Signatures to Identify Repositioned Drugs for Breast, Myelogenous Leukemia and Prostate Cancer , 2012, PLoS Comput. Biol..

[22]  Mario Medvedovic,et al.  LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data , 2009, Bioinform..

[23]  Robert Gentleman,et al.  Querying Genomic Databases: Refining the Connectivity Map , 2012, Statistical applications in genetics and molecular biology.

[24]  Chun Li,et al.  Strategy for encoding and comparison of gene expression signatures , 2007, Genome Biology.