Adaptive one-class Gaussian processes allow accurate prioritization of oncology drug targets

MOTIVATION The cost of drug development has dramatically increased in the last decades, with the number new drugs approved per billion US dollars spent on R&D halving every year or less. The selection and prioritization of targets is one the the most influential decisions in drug discovery. Here we present a Gaussian Process model for the prioritization of drug targets cast as a problem of learning with only positive and unlabeled examples. RESULTS Since the absence of negative samples does not allow standard methods for automatic selection of hyperparameters, we propose a novel approach for hyperparameter selection of the kernel in One Class Gaussian Processes. We compare our methods with state-of-the-art approaches on benchmark datasets and then show its application to druggability prediction of oncology drugs. Our score reaches an AUC 0.90 on a set of clinical trial targets starting from a small training set of 102 validated oncology targets. Our score recovers the majority of known drug targets and can be used to identify novel set of proteins as drug target candidates. AVAILABILITY Source code implemented in Python is freely available for download at https://github.com/AntonioDeFalco/Adaptive-OCGP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  M. Ceccarelli,et al.  Machine learning prediction of oncology drug targets based on protein and network properties , 2019, BMC Bioinformatics.

[2]  Olivier Elemento,et al.  A Bayesian machine learning approach for drug target identification using diverse data types , 2019, Nature Communications.

[3]  Rui-huan Gan,et al.  FZD2 regulates cell proliferation and invasion in tongue squamous cell carcinoma , 2019, International journal of biological sciences.

[4]  M. Kumar,et al.  Publisher Correction: Microwave photons emitted by fractionally charged quasiparticles , 2019, Nature Communications.

[5]  Parantu K. Shah,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[6]  Emanuel J. V. Gonçalves,et al.  Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens , 2019, Nature.

[7]  M. Zhang,et al.  Role of Adiponectin in prostate cancer , 2019, International braz j urol : official journal of the Brazilian Society of Urology.

[8]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[9]  A. Osborne,et al.  Major intensification of Atlantic overturning circulation at the onset of Paleogene greenhouse warmth , 2018, Nature Communications.

[10]  Aviad Tsherniak,et al.  Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration , 2018, Nature Communications.

[11]  Jin-San Zhang,et al.  Fibroblast Growth Factor 10 in Pancreas Development and Pancreatic Cancer , 2018, Front. Genet..

[12]  R. Grose,et al.  Emerging Roles of Fibroblast Growth Factor 10 in Cancer , 2018, Front. Genet..

[13]  Tingting Fu,et al.  Therapeutic target database update 2018: enriched resource for facilitating bench-to-clinic research of targeted therapeutics , 2017, Nucleic Acids Res..

[14]  Johan Lennartsson,et al.  The PDGF/PDGFR pathway as a drug target. , 2017, Molecular aspects of medicine.

[15]  N. Greig,et al.  Adiponectin as a Potential Therapeutic Target for Prostate Cancer. , 2017, Current pharmaceutical design.

[16]  F. Bertucci,et al.  Characterization and Targeting of Platelet-Derived Growth Factor Receptor alpha (PDGFRA) in Inflammatory Breast Cancer (IBC)1 , 2017, Neoplasia.

[17]  Hyunju Lee,et al.  In silico re-identification of properties of drug target proteins , 2017, BMC Bioinformatics.

[18]  A. Harris,et al.  Role of Delta-like 4 in Jagged1-induced tumour angiogenesis and tumour growth , 2017, Oncotarget.

[19]  A. Bauer,et al.  Epiregulin is required for lung tumor promotion in a murine two‐stage carcinogenesis model , 2017, Molecular carcinogenesis.

[20]  S. Durum,et al.  Therapeutic targeting of IL-7Rα signaling pathways in ALL treatment. , 2016, Blood.

[21]  Paul D. Gader,et al.  One-Class Gaussian Process for Possibilistic Classification Using Imaging Spectroscopy , 2016, IEEE Geoscience and Remote Sensing Letters.

[22]  Artem Sokolov,et al.  One-Class Detection of Cell States in Tumor Subtypes , 2016, PSB.

[23]  M. Schroeder,et al.  Drug target prioritization by perturbed gene expression and network information , 2015, Scientific Reports.

[24]  N. Sunaga,et al.  Epiregulin as a therapeutic target in non-small-cell lung cancer , 2015, Lung Cancer.

[25]  Huangang Wang,et al.  Hyperparameter Selection for Gaussian Process One-Class Classification , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Nannan Li,et al.  Anomaly Detection in Video Surveillance via Gaussian Process , 2015, Int. J. Pattern Recognit. Artif. Intell..

[27]  J. Kitajewski,et al.  NOTCH decoys that selectively block DLL/NOTCH or JAG/NOTCH disrupt angiogenesis by unique mechanisms to inhibit tumor growth. , 2015, Cancer discovery.

[28]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[29]  C. Heldin,et al.  Targeting the PDGF signaling pathway in tumor treatment , 2013, Cell Communication and Signaling.

[30]  Joachim Denzler,et al.  One-class classification with Gaussian processes , 2010, Pattern Recognit..

[31]  J. Scannell,et al.  Diagnosing the decline in pharmaceutical R&D efficiency , 2012, Nature Reviews Drug Discovery.

[32]  Trevor Darrell,et al.  Gaussian Processes for Object Categorization , 2010, International Journal of Computer Vision.

[33]  Charles Elkan,et al.  Learning gene regulatory networks from only positive and unlabeled data , 2010, BMC Bioinformatics.

[34]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[35]  Andrew J. Doig,et al.  Properties and identification of human protein drug targets , 2009, Bioinform..

[36]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[37]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[38]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[39]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[40]  Claire O'Donovan,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 , 1999, Nucleic Acids Res..

[41]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.