AICM: A Genuine Framework for Correcting Inconsistency Between Large Pharmacogenomics Datasets

The inconsistency of open pharmacogenomics datasets produced by different studies limits the usage of pharmacogenomics in biomarker discovery. Investigation of multiple pharmacogenomics datasets confirmed that the pairwise sensitivity data correlation between drugs, or rows, across different studies (drug-wise) is relatively low, while the pairwise sensitivity data correlation between cell-lines, or columns, across different studies (cell-wise) is considerably strong. This common interesting observation across multiple pharmacogenomics datasets suggests the existence of subtle consistency among the different studies (i.e., strong cell-wise correlation). However, significant noises are also shown (i.e., weak drug-wise correlation) and have prevented researchers from comfortably using the data directly. Motivated by this observation, we propose a novel framework for addressing the inconsistency between large-scale pharmacogenomics data sets. Our method can significantly boost the drug-wise correlation and can be easily applied to re-summarized and normalized datasets proposed by others. We also investigate our algorithm based on many different criteria to demonstrate that the corrected datasets are not only consistent, but also biologically meaningful. Eventually, we propose to extend our main algorithm into a framework, so that in the future when more data-sets become publicly available, our framework can hopefully offer a “ground-truth” guidance for references.

[1]  F. Collins,et al.  Aiming High--Changing the Trajectory for Cancer. , 2016, The New England journal of medicine.

[2]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[3]  Krister Wennerberg,et al.  Consistency in drug response profiling , 2016, Nature.

[4]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[5]  Amos Bairoch,et al.  The Cellosaurus, a Cell-Line Knowledge Resource. , 2018, Journal of biomolecular techniques : JBT.

[6]  Benjamin Haibe-Kains,et al.  Inconsistency in large pharmacogenomic studies , 2013, Nature.

[7]  P. Sorger,et al.  Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs , 2016, Nature Methods.

[8]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[9]  Shiqian Ma,et al.  On the Global Linear Convergence of the ADMM with MultiBlock Variables , 2014, SIAM J. Optim..

[10]  M. Gilson,et al.  Public domain databases for medicinal chemistry. , 2012, Journal of medicinal chemistry.

[11]  Sarah Watson,et al.  Pragmatic issues in biomarker evaluation for targeted therapies in cancer , 2015, Nature Reviews Clinical Oncology.

[12]  J. H. Zar,et al.  Significance Testing of the Spearman Rank Correlation Coefficient , 1972 .

[13]  A. Butte,et al.  Leveraging big data to transform target selection and drug discovery , 2016, Clinical pharmacology and therapeutics.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Emanuel J. V. Gonçalves,et al.  A Landscape of Pharmacogenomic Interactions in Cancer , 2016, Cell.

[16]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[17]  Gary D Bader,et al.  Functional Genomic Landscape of Human Breast Cancer Drivers, Vulnerabilities, and Resistance , 2016, Cell.

[18]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[19]  Benjamin Haibe-Kains,et al.  Revisiting inconsistency in large pharmacogenomic studies , 2015, bioRxiv.

[20]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[21]  Marc Hafner,et al.  Profiles of Basal and Stimulated Receptor Signaling Networks Predict Drug Response in Breast Cancer Lines , 2013, Science Signaling.

[22]  Krister Wennerberg,et al.  Quantitative scoring of differential drug sensitivity for individually optimized anticancer therapies , 2014, Scientific Reports.

[23]  Joshua A. Bittker,et al.  Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset. , 2015, Cancer discovery.

[24]  Joshua A. Bittker,et al.  Correlating chemical sensitivity and basal gene expression reveals mechanism of action , 2015, Nature chemical biology.

[25]  Marc R. Birtwistle,et al.  Drug response consistency in CCLE and CGP , 2016, Nature.

[26]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[27]  Su-In Lee,et al.  A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia , 2018, Nature Communications.

[28]  Laura M. Heiser,et al.  Modeling precision treatment of breast cancer , 2013, Genome Biology.

[29]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[30]  Joshua C. Gilbert,et al.  An Interactive Resource to Identify Cancer Genetic and Lineage Dependencies Targeted by Small Molecules , 2013, Cell.

[31]  Margarita Lopatin,et al.  Validation of the 12-gene colon cancer recurrence score in NSABP C-07 as a predictor of recurrence in patients with stage II and III colon cancer treated with fluorouracil and leucovorin (FU/LV) and FU/LV plus oxaliplatin. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[32]  Tsuyoshi Murata,et al.  {m , 1934, ACML.