ECoFFeS: A Software Using Evolutionary Computation for Feature Selection in Drug Discovery

Feature selection is of particular importance in the field of drug discovery. Many methods have been put forward for feature selection during recent decades. Among them, evolutionary computation has gained increasing attention owing to its superior global search ability. However, there still lacks a simple and efficient software for drug developers to take advantage of evolutionary computation for feature selection. To remedy this issue, in this paper, a user-friendly and standalone software, named ECoFFeS, is developed. ECoFFeS is expected to lower the entry barrier for drug developers to deal with feature selection problems at hand by using evolutionary algorithms. To the best of our knowledge, it is the first software integrating a set of evolutionary algorithms (including two modified evolutionary algorithms proposed by the authors) with various evaluation combinations for feature selection. Specifically, ECoFFeS considers both single-objective and multi-objective evolutionary algorithms, and both regression- and classification-based models to meet different requirements. Five data sets in drug discovery are collected in ECoFFeS. In addition, to reduce the total analysis time, the parallel execution technique is incorporated into ECoFFeS. The source code of ECoFFeS can be available from https://github.com/JiaweiHuang/ECoFFeS/.

[1]  M. Waring Lipophilicity in drug discovery , 2010, Expert Opinion on Drug Discovery.

[2]  Walter A Korfmacher,et al.  Direct cocktail analysis of drug discovery compounds in pooled plasma samples using liquid chromatography-tandem mass spectrometry. , 2002, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[3]  Alan R. Katritzky,et al.  Quantum-Chemical Descriptors in QSAR/QSPR Studies , 1996 .

[4]  Alexandros André Chaaraoui,et al.  Human action recognition optimization based on evolutionary feature subset selection , 2013, GECCO '13.

[5]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[6]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[7]  E. Poluzzi,et al.  The hERG K+ channel: target and antitarget strategies in drug development. , 2008, Pharmacological research.

[8]  Jeffrey S. Racine,et al.  CROSS-VALIDATED LOCAL LINEAR NONPARAMETRIC REGRESSION , 2004 .

[9]  George Kollias,et al.  A combined LS-SVM & MLR QSAR workflow for predicting the inhibition of CXCR3 receptor by quinazolinone analogs , 2010, Molecular Diversity.

[10]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[13]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[14]  Dan Li,et al.  ADMET Evaluation in Drug Discovery. 16. Predicting hERG Blockers by Combining Multiple Pharmacophores and Machine Learning Approaches. , 2016, Molecular pharmaceutics.

[15]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[16]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[17]  Agma J. M. Traina,et al.  Improving the ranking quality of medical image retrieval using a genetic feature selection method , 2011, Decis. Support Syst..

[18]  Gerardo Beruvides,et al.  A Simple Multi-Objective Optimization Based on the Cross-Entropy Method , 2017, IEEE Access.

[19]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[20]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[21]  Zulaiha Ali Othman,et al.  Bees algorithm for feature selection in network anomaly detection , 2012 .

[22]  Robert B. Burns,et al.  Introduction to Research Methods , 2015, Research Methods for Political Science.

[23]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[24]  Kunal Roy,et al.  Comparative QSARs for antimalarial endochins: Importance of descriptor-thinning and noise reduction prior to feature selection , 2011 .

[25]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[26]  Siqun Wang,et al.  Microarray analysis in drug discovery and clinical applications. , 2006, Methods in molecular biology.

[27]  Qing Chang,et al.  Feature selection methods for big data bioinformatics: A survey from the search perspective. , 2016, Methods.

[28]  Qingfu Zhang,et al.  Multiobjective evolutionary algorithms: A survey of the state of the art , 2011, Swarm Evol. Comput..

[29]  Dong-Sheng Cao,et al.  In silico evaluation of logD7.4 and comparison with other prediction methods , 2015 .

[30]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[31]  Charles C. Persinger,et al.  How to improve R&D productivity: the pharmaceutical industry's grand challenge , 2010, Nature Reviews Drug Discovery.

[32]  Wynne W. Chin The partial least squares approach for structural equation modeling. , 1998 .

[33]  Huan Liu,et al.  Manipulating Data and Dimension Reduction Methods: Feature Selection , 2009, Encyclopedia of Complexity and Systems Science.

[34]  Dong-Sheng Cao,et al.  Incorporating PLS model information into particle swarm optimization for descriptor selection in QSAR/QSPR , 2015 .

[35]  Ronald Simon,et al.  Tissue microarrays in drug discovery , 2003, Nature Reviews Drug Discovery.

[36]  Mengjie Zhang,et al.  Improving feature ranking for biomarker discovery in proteomics mass spectrometry data using genetic programming , 2014, Connect. Sci..

[37]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[38]  Valerie J Gillet,et al.  Multiobjective optimization in quantitative structure-activity relationships: deriving accurate and interpretable QSARs. , 2002, Journal of medicinal chemistry.

[39]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[40]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[41]  P. Costa,et al.  Modeling and comparison of dissolution profiles. , 2001, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[42]  H. Moghaddam,et al.  Feature Subset Selection for Face Detection Using Genetic Algorithms and Particle Swarm Optimization , 2006, 2006 IEEE International Conference on Networking, Sensing and Control.

[43]  Paul Scheunders,et al.  Genetic feature selection combined with composite fuzzy nearest neighbor classifiers for hyperspectral satellite imagery , 2002, Pattern Recognit. Lett..

[44]  Rolf Hilfiker,et al.  The use of single-nucleotide polymorphism maps in pharmacogenomics , 2000, Nature Biotechnology.

[45]  Seung Joo Cho,et al.  2D-QSAR of non-benzodiazepines to benzodiazepines receptor (BZR) , 2009, Medicinal Chemistry Research.

[46]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[47]  Hisao Ishibuchi,et al.  Benchmarking Multi- and Many-Objective Evolutionary Algorithms Under Two Optimization Scenarios , 2017, IEEE Access.

[48]  Harry J Witchel,et al.  The hERG potassium channel as a therapeutic target , 2007, Expert opinion on therapeutic targets.

[49]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[50]  Giulia Caron,et al.  Contribution of ionization and lipophilicity to drug binding to albumin: a preliminary step toward biodistribution prediction. , 2004, Journal of medicinal chemistry.

[51]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[52]  Hui Tian,et al.  Social-Aware Resource Allocation for Content Dissemination Networks: An Evolutionary Game Approach , 2017, IEEE Access.

[53]  Bruno O Villoutreix,et al.  Computational investigations of hERG channel blockers: New insights and current predictive models. , 2015, Advanced drug delivery reviews.

[54]  Andries Petrus Engelbrecht,et al.  Binary Differential Evolution , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[55]  Yong Wang,et al.  Utilizing cumulative population distribution information in differential evolution , 2016, Appl. Soft Comput..

[56]  W. L. Jorgensen The Many Roles of Computation in Drug Discovery , 2004, Science.

[57]  Carlos R Rodrigues,et al.  Structure-activity relationships of the antimalarial agent artemisinin. 6. The development of predictive in vitro potency models using CoMFA and HQSAR methodologies. , 2002, Journal of medicinal chemistry.