Model-Based Lead Molecule Design

“Lead molecule” is a chemical compound deemed as a good candidate for drug discovery. Designing a lead molecule for optimization involves a complex phase in which researchers look for compounds that satisfy pharmaceutical properties and can then be investigated for drug development and clinical trials. Finding the optimal lead molecule is a hard problem that commonly requires searching in high dimensional and large experimental spaces. In this paper we propose to discover the optimal lead molecule by developing an evolutionary model-based approach where different classes of statistical models can achieve relevant information. The analysis is conducted comparing two different chemical representations of molecules: the amino-boronic acid representation and the chemical fragment representation. To deal with the high dimensionality of the fragment representation we adopt the Formal Concept Analysis and we then derive the evolutionary path on a reduced number of fragments. This approach has been tested on a particular data set of 2500 molecules and the achieved results show the very good performance of this strategy.

[1]  P. Bühlmann Boosting for high-dimensional linear models , 2006 .

[2]  Debora Slanzi,et al.  Designing Lead Optimisation of MMP-12 Inhibitors , 2014, Comput. Math. Methods Medicine.

[3]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[4]  Francesco Battaglia,et al.  Evolutionary Statistical Procedures: An Evolutionary Computation Approach to Statistical Procedures Designs and Applications , 2011 .

[5]  Ian Hughes,et al.  Automated Lead Optimization of MMP-12 Inhibitors Using a Genetic Algorithm. , 2011, ACS medicinal chemistry letters.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Debora Slanzi,et al.  Querying Bayesian networks to design experiments with application to 1AGY serine esterase protein engineering , 2015 .

[8]  Amedeo Napoli,et al.  Using Formal Concept Analysis for the Extraction of Groups of Co-expressed Genes , 2008, MCO.

[9]  Eugen Lounkine,et al.  Formal concept analysis for the identification of molecular fragment combinations specific for active and highly potent compounds. , 2008, Journal of medicinal chemistry.

[10]  S. Siva Sathya,et al.  Evolutionary algorithms for de novo drug design - A survey , 2015, Appl. Soft Comput..

[11]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[12]  B. Peter BOOSTING FOR HIGH-DIMENSIONAL LINEAR MODELS , 2006 .