Revisiting inconsistency in large pharmacogenomic studies

Background: In 2012, two large pharmacogenomic studies, the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE), were published, each reported gene expression data and measures of drug response for a large number of drugs and hundreds of cell lines. In 2013, we published a comparative analysis that reported gene expression profiles for the 471 cell lines profiled in both studies and dose response measurements for the 15 drugs characterized in the common cell lines by both studies. While we found good concordance in gene expression profiles, there was substantial inconsistency in the drug responses reported by the GDSC and CCLE projects. Our paper was widely discussed and we received extensive feedback on the comparisons that we performed. This feedback, along with the release of new data, prompted us to revisit our initial analysis. Here we present a new analysis using these expanded data in which we address the most significant suggestions for improvements on our published analysis: that drugs with different response characteristics should have been treated differently, that targeted therapies and broad cytotoxic drugs should have been treated differently in assessing consistency, that consistency of both molecular profiles and drug sensitivity measurements should both be compared across cell lines to accurately assess differences in the studies, that we missed some biomarkers that are consistent between studies, and that the software analysis tools we provided with our analysis should have been easier to run, particularly as the GDSC and CCLE released additional data. Methods: For each drug, we used published sensitivity data from the GDSC and CCLE to separately estimate drug dose-response curves. We then used two statistics, the area between drug dose-response curves (ABC) and the Matthews correlation coefficient (MCC), to robustly estimate the consistency of continuous and discrete drug sensitivity measures, respectively. We also used recently released RNA-seq data together with previously published gene expression microarray data to assess inter-platform reproducibility of cell line gene expression profiles. Results: This re-analysis supports our previous finding that gene expression data are significantly more consistent than drug sensitivity measurements. The use of new statistics to assess data consistency allowed us to identify two broad effect drugs — 17-AAG and PD-0332901 — and three targeted drugs — PLX4720, nilotinib and crizotinib — with moderate to good consistency in drug sensitivity data between GDSC and CCLE. Not enough sensitive cell lines were screened in both studies to robustly assess consistency for three other targeted drugs, PHA-665752, erlotinib, and sorafenib. Concurring with our published results, we found evidence of inconsistencies in pharmacological phenotypes for the remaining eight drugs. Further, to discover “consistency” between studies required the use of multiple statistics and the selection of specific measures on a case-by-case basis. Conclusion: Our results reaffirm our initial findings of an inconsistency in drug sensitivity measures for eight of fifteen drugs screened both in GDSC and CCLE, irrespective of which statistical metric was used to assess correlation. Taken together, our findings suggest that the phenotypic data on drug response in the GDSC and CCLE continue to present challenges for robust biomarker discovery. This re-analysis provides additional support for the argument that experimental standardization and validation of pharmacogenomic response will be necessary to advance the broad use of large pharmacogenomic screens.

[1]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[2]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[3]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[4]  Robert H. Somers,et al.  A new asymmetric measure of association for ordinal variables. , 1962 .

[5]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[6]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[7]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[8]  D Weininger,et al.  SMILES: a line notation and computerized interpreter for chemical structures. , 1987 .

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[11]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[12]  Kevin C. Dorff,et al.  The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models , 2010, Nature Biotechnology.

[13]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[14]  Christophe G. Lambert,et al.  Technical Reproducibility of Genotyping SNP Arrays Used in Genome-Wide Association Studies , 2012, PloS one.

[15]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[16]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[17]  Mohammad Fallahi-Sichani,et al.  Metrics other than potency reveal systematic variation in responses to cancer drugs. , 2013, Nature chemical biology.

[18]  Benjamin Haibe-Kains,et al.  Research and applications: Comparison and validation of genomic predictors for anticancer drug sensitivity , 2013, J. Am. Medical Informatics Assoc..

[19]  Benjamin Haibe-Kains,et al.  Inconsistency in large pharmacogenomic studies , 2013, Nature.

[20]  Robert Clarke,et al.  Enhancing reproducibility in cancer drug screening: how do we move forward? , 2014, Cancer research.

[21]  Crispin J. Miller,et al.  Discrepancies in cancer genomic sequencing highlight opportunities for driver mutation discovery. , 2014, Cancer research.

[22]  Brian Craft,et al.  The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data , 2014, Database J. Biol. Databases Curation.

[23]  Justin Guinney,et al.  Systematic Assessment of Analytical Methods for Drug Sensitivity Prediction from Cancer Cell Line Data , 2013, Pacific Symposium on Biocomputing.

[24]  Joshua S. Kaminker,et al.  A resource for cell line authentication, annotation and quality control , 2015, Nature.

[25]  Chun Xing Li,et al.  Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection , 2015, BMC Cancer.

[26]  Laura M. Heiser,et al.  Tumor-Derived Cell Lines as Molecular Models of Cancer Pharmacogenomics , 2015, Molecular Cancer Research.

[27]  Isidro Cortes-Ciriano,et al.  Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel , 2015, Bioinform..

[28]  Andrew H. Beck,et al.  PharmacoGx: an R package for analysis of large pharmacogenomic datasets , 2015, Bioinform..