MoDeSuS: A Machine Learning Tool for Selection of Molecular Descriptors in QSAR Studies Applied to Molecular Informatics

The selection of the most relevant molecular descriptors to describe a target variable in the context of QSAR (Quantitative Structure-Activity Relationship) modelling is a challenging combinatorial optimization problem. In this paper, a novel software tool for addressing this task in the context of regression and classification modelling is presented. The methodology that implements the tool is organized into two phases. The first phase uses a multiobjective evolutionary technique to perform the selection of subsets of descriptors. The second phase performs an external validation of the chosen descriptors subsets in order to improve reliability. The tool functionalities have been illustrated through a case study for the estimation of the ready biodegradation property as an example of classification QSAR modelling. The results obtained show the usefulness and potential of this novel software tool that aims to reduce the time and costs of development in the drug discovery process.

[1]  D. Alberga,et al.  Multitarget Drug Design for Neurodegenerative Diseases , 2018 .

[2]  Fernanda Borges,et al.  QSAR and Complex Network Recognition of miRNAs in Stem Cells , 2013 .

[3]  A. Speck-Planche,et al.  BET bromodomain inhibitors: fragment-based in silico design using multi-target QSAR models , 2018, Molecular Diversity.

[4]  Saloni,et al.  Molecular docking, QSAR and ADMET studies of withanolide analogs against breast cancer , 2017, Drug design, development and therapy.

[5]  Bieke Dejaegher,et al.  Feature selection methods in QSAR studies. , 2012, Journal of AOAC International.

[6]  Fiorella Cravero,et al.  Feature Learning applied to the Estimation of Tensile Strength at Break in Polymeric Material Design , 2016, J. Integr. Bioinform..

[7]  Gustavo Henrique Goulart Trossini,et al.  Use of machine learning approaches for novel drug discovery , 2016, Expert opinion on drug discovery.

[8]  Ignacio Ponzoni,et al.  Biclustering as Strategy for Improving Feature Selection in Consensus QSAR Modeling , 2018, Electron. Notes Discret. Math..

[9]  María Lourdes Borrajo Diz,et al.  Improving imbalanced scientific text classification using sampling strategies and dictionaries , 2011, J. Integr. Bioinform..

[10]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Richard J. Povinelli,et al.  An ensemble model of QSAR tools for regulatory risk assessment , 2016, Journal of Cheminformatics.

[13]  B. Niu,et al.  2D-QSAR and 3D-QSAR Analyses for EGFR Inhibitors , 2017, BioMed research international.

[14]  Sonia Arrasate,et al.  Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies , 2018, J. Chem. Inf. Model..

[15]  Fiorella Cravero,et al.  Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery , 2017, Scientific Reports.

[16]  Kunal Roy,et al.  Multi-Target Drug Design Using Chem-Bioinformatic Approaches , 2019, Methods in Pharmacology and Toxicology.

[17]  Marlene T. Kim,et al.  Developing Enhanced Blood–Brain Barrier Permeability Models: Integrating External Bio-Assay Data in QSAR Modeling , 2015, Pharmaceutical Research.

[18]  Gerta Rücker,et al.  y-Randomization and Its Variants in QSPR/QSAR , 2007, J. Chem. Inf. Model..

[19]  George Lambrinidis,et al.  Challenges with multi-objective QSAR in drug discovery , 2018, Expert opinion on drug discovery.

[20]  Jonas Boström,et al.  Deep Convolutional Neural Networks for the Prediction of Molecular Properties: Challenges and Opportunities Connected to the Data , 2018, J. Integr. Bioinform..

[21]  Ignacio Ponzoni,et al.  QSAR Modelling for Drug Discovery: Predicting the Activity of LRRK2 Inhibitors for Parkinson's Disease Using Cheminformatics Approaches , 2018, PACBB.

[22]  Da Qi,et al.  An Ontology for Description of Drug Discovery Investigations , 2010, J. Integr. Bioinform..

[23]  Ignacio Ponzoni,et al.  Multi‐Objective Feature Selection in QSAR Using a Machine Learning Approach , 2009 .

[24]  Kyle V. Camarda,et al.  Antibiotic Molecular Design Using Multi-Objective Optimization , 2018 .

[25]  J. Majoral,et al.  Present drug-likeness filters in medicinal chemistry during the hit and lead optimization process: how far can they be simplified? , 2018, Drug discovery today.

[26]  Lei Xie,et al.  Providing data science support for systems pharmacology and its implications to drug discovery , 2016, Expert opinion on drug discovery.

[27]  Yujin Hoshida,et al.  Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases. , 2017, Gastroenterology.

[28]  Roberto Todeschini,et al.  Quantitative Structure − Activity Relationship Models for Ready Biodegradability of Chemicals , 2013 .

[29]  Lu Zhang,et al.  From machine learning to deep learning: progress in machine intelligence for rational drug discovery. , 2017, Drug discovery today.

[30]  Nathan Brown,et al.  Multi-objective optimization methods in drug design. , 2013, Drug discovery today. Technologies.

[31]  Lei Xie,et al.  Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem , 2016, Scientific Reports.

[32]  Roberto Todeschini,et al.  Impact of Molecular Descriptors on Computational Models. , 2018, Methods in molecular biology.

[33]  Yunfei Li,et al.  Design of multi‐drug combinations for poly‐pharmacological effects using composition‐activity relationship modeling and multi‐objective optimization approach: Application in traditional Chinese medicine , 2018, Chemical biology & drug design.

[34]  Juan J de Pablo,et al.  Polymer Informatics: Opportunities and Challenges. , 2017, ACS macro letters.

[35]  Hugo Kubinyi,et al.  QSAR in Drug Design , 2008 .

[36]  Sotirios Katsamakas,et al.  Computational Design of Multitarget Drugs Against Alzheimer’s Disease , 2018 .

[37]  Ignacio Ponzoni,et al.  Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods , 2015, Journal of Cheminformatics.

[38]  Nuno A. Fonseca,et al.  A Relational Learning Approach to Structure-Activity Relationships in Drug Design Toxicity Studies , 2011, J. Integr. Bioinform..

[39]  Fiorella Cravero,et al.  QSAR Classification Models for Predicting Affinity to Blood or Liver of Volatile Organic Compounds in e-Health , 2017, IWBBIO.

[40]  Wencong Lu,et al.  Predicting the DPP-IV Inhibitory Activity pIC50 Based on Their Physicochemical Properties , 2013, BioMed research international.

[41]  Valder Steffen,et al.  Determination of an optimal control strategy for drug administration in tumor treatment using multi-objective optimization differential evolution , 2016, Comput. Methods Programs Biomed..

[42]  Danishuddin,et al.  Descriptors and their selection methods in QSAR analysis: paradigm for drug design. , 2016, Drug discovery today.

[43]  Fiorella Cravero,et al.  FS4RVDD: A Feature Selection Algorithm for Random Variables with Discrete Distribution , 2018, IPMU.

[44]  Ignacio Ponzoni,et al.  QSPR Models for Predicting Log Pliver Values for Volatile Organic Compounds Combining Statistical Methods and Domain Knowledge , 2012, Molecules.

[45]  Scott Boyer,et al.  Choosing Feature Selection and Learning Algorithms in QSAR , 2014, J. Chem. Inf. Model..

[46]  Ignacio Ponzoni,et al.  Comparing Multiobjective Evolutionary Algorithms for Cancer Data Microarray Feature Selection , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[47]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Ignacio Ponzoni,et al.  A Wrapper-Based Feature Selection Method for ADMET Prediction Using Evolutionary Computing , 2008, EvoBIO.

[49]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .