An Evolutionary Optimizer of libsvm Models

This user guide describes the rationale behind, and the modus operandi of a Unix script-driven package for evolutionary searching of optimal Support Vector Machine model parameters as computed by the libsvm package, leading to support vector machine models of maximal predictive power and robustness. Unlike common libsvm parameterizing engines, the current distribution includes the key choice of best-suited sets of attributes/descriptors, in addition to the classical libsvm operational parameters (kernel choice, kernel parameters, cost, and so forth), allowing a unified search in an enlarged problem space. It relies on an aggressive, repeated cross-validation scheme to ensure a rigorous assessment of model quality. Primarily designed for chemoinformatics applications, it also supports the inclusion of decoy instances, for which the explained property (bioactivity) is, strictly speaking, unknown but presumably “inactive”, thus additionally testing the robustness of a model to noise. The package was developed with parallel computing in mind, supporting execution on both multi-core workstations as well as compute cluster environments. It can be downloaded from http://infochim.u-strasbg.fr/spip.php?rubrique178.

[1]  Benjamin Parent,et al.  Fuzzy Tricentric Pharmacophore Fingerprints, 1. Topological Fuzzy Pharmacophore Triplets and Adapted Molecular Similarity Scoring Schemes , 2006, J. Chem. Inf. Model..

[2]  Yaming Wang,et al.  Study on Parameter Optimization for Support Vector Regression in Solving the Inverse ECG Problem , 2013, Comput. Math. Methods Medicine.

[3]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[4]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[5]  Gilles Marcou,et al.  Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models , 2009, J. Chem. Inf. Model..

[6]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[7]  Gilles Marcou,et al.  Computational chemogenomics: Is it more than inductive transfer? , 2014, Journal of Computer-Aided Molecular Design.

[8]  L. Buydens,et al.  Determination of optimal support vector regression parameters by genetic algorithms and simplex optimization , 2005 .

[9]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[10]  Yuan Ren,et al.  Determination of Optimal SVM Parameters by Using GA/PSO , 2010, J. Comput..

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Horvath Dragos,et al.  Predicting the predictability: a unified approach to the applicability domain problem of QSAR models. , 2009, Journal of chemical information and modeling.