How far have decision tree models come for data mining in drug discovery?

Machine learning (ML) methods assist in drug discovery mostly by way of data mining in virtual screening (VS). If the target is sufficiently characterized, say by knowledge of its threedimensional (3D) structure or gene sequence, we can take a structure-based VS (SBVS) approach and run molecular docking and dynamics, or 3D-similarity matching experiments. More often, however, we only know a set of molecular structures and their biological activities, and so we may perform a ligand-based VS (LBVS) [1]. The results of an LBVS, which are computationally much less expensive to obtain than those of an SBVS, can be the basis of chemical database queries and, optimally, at an early stage of drug discovery enhance our understanding of how a molecule’s action may come about (hypothesis generation). The concept of decision trees is well suited for this.

[1]  Taravat Ghafourian,et al.  Decision trees to characterise the roles of permeability and solubility on the prediction of oral absorption. , 2015, European journal of medicinal chemistry.

[2]  J C Gertrudes,et al.  Machine learning techniques and drug design. , 2012, Current medicinal chemistry.

[3]  Jon Atli Benediktsson,et al.  Automatic selection of molecular descriptors using random forest: Application to drug discovery , 2017, Expert Syst. Appl..

[4]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[5]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Lu Zhang,et al.  From machine learning to deep learning: progress in machine intelligence for rational drug discovery. , 2017, Drug discovery today.

[7]  Felix Hammann,et al.  Decision tree models for data mining in hit discovery , 2012, Expert opinion on drug discovery.

[8]  Bo-Han Su,et al.  Rule-Based Prediction Models of Cytochrome P450 Inhibition , 2015, J. Chem. Inf. Model..

[9]  J. R. Quinlan Constructing Decision Trees , 1993 .

[10]  J. Topliss,et al.  A manual method for applying the Hansch approach to drug design. , 1977, Journal of medicinal chemistry.

[11]  P. Blower,et al.  Decision tree methods in pharmaceutical research. , 2006, Current topics in medicinal chemistry.

[12]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[13]  Sean Ekins,et al.  Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. , 2017, Molecular pharmaceutics.

[14]  Matthew L. Danielson,et al.  In Silico and in Vitro Assessment of OATP1B1 Inhibition in Drug Discovery. , 2018, Molecular pharmaceutics.

[15]  Faisal Saeed,et al.  Ensemble learning method for the prediction of new bioactive molecules , 2018, PloS one.

[16]  J. Topliss,et al.  Utilization of operational schemes for analog synthesis in drug design. , 1972, Journal of medicinal chemistry.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[19]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[20]  Hojung Nam,et al.  Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints , 2017, BMC Bioinformatics.

[21]  W. Tong,et al.  Development of Decision Forest Models for Prediction of Drug-Induced Liver Injury in Humans Using A Large Set of FDA-approved Drugs , 2017, Scientific Reports.

[22]  Markus Wagener,et al.  Potential Drugs and Nondrugs: Prediction and Identification of Important Structural Features , 2000, J. Chem. Inf. Comput. Sci..

[23]  Gustavo Henrique Goulart Trossini,et al.  Use of machine learning approaches for novel drug discovery , 2016, Expert opinion on drug discovery.

[24]  Dong-Sheng Cao,et al.  Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues , 2017 .