Prediction of Solubility of Fullerene C60 in Various Organic Solvents by Genetic Algorithm-Multiple Linear Regression

Quantitative structure property relationship (QSPR) study is presented for modeling and predicting of solubility of fullerene (C60) in various solvents. A data set consisting of 36 benzene derivatives is used in this study. Various kinds of molecular descriptors were calculated to represent the molecular structures of compounds and the best-fitting descriptors were selected by using stepwise multiple linear regressions (SW-MLR) and a genetic algorithm (GA-MLR) the selection of variables. The models were validated using leave-one-out (LOO), leave-multiple-out (LMO) cross-validation, external test set and Y-randomization test. The outliers were also examined to understand better in which cases large errors were to be expected and to improve the predictive models. Comparison of the results obtained indicated the superiority of the genetic algorithm over the stepwise. Also electronegativity, dispersion interaction in solution and volume of molecule were the main independent factors contributing to the solubility of fullerenes in the studied solvents.

[1]  E. Forgács,et al.  Three-dimensional principal component analysis employed for the study of the β-glucosidase production of Lentinus edodes strains , 2001 .

[2]  A. Vul Some Aspects of Fullerene Application , 2002 .

[3]  Peter C. Jurs,et al.  Prediction of C60 Solubilities from Solvent Molecular Structures , 2001, J. Chem. Inf. Comput. Sci..

[4]  M. Korobov,et al.  Solubility of the Fullerenes , 2010 .

[5]  M. Ganjali,et al.  QSAR Study of 2‐(1‐Propylpiperidin‐4‐yl)‐1H‐Benzimidazole‐4‐Carboxamide as PARP Inhibitors for Treatment of Cancer , 2008, Chemical biology & drug design.

[6]  M. Katz Validation of models , 2006 .

[7]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[8]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[9]  M. Ganjali,et al.  Investigation of different linear and nonlinear chemometric methods for modeling of retention index of essential oil components: concerns to support vector machine. , 2009, Journal of hazardous materials.

[10]  J. Doucet,et al.  QSAR models for 2-amino-6-arylsulfonylbenzonitriles and congeners HIV-1 reverse transcriptase inhibitors based on linear and nonlinear regression methods. , 2009, European journal of medicinal chemistry.

[11]  Riccardo Leardi,et al.  Genetic Algorithms as a Tool for Wavelength Selection in Multivariate Calibration , 1995 .

[12]  R. Natarajan,et al.  QSPR Modeling for Solubility of Fullerene (C60) in Organic Solvents , 2001, J. Chem. Inf. Comput. Sci..

[13]  M. Ganjali,et al.  QSPR Study of the Distribution Coefficient Property for Hydantoin and 5‐Arylidene Derivatives. A Genetic Algorithm Application for the Variable Selection in the MLR and PLS Methods , 2008 .

[14]  Hugh M. Cartwright,et al.  Applications of artificial intelligence in chemistry , 1993 .

[15]  M. Jalali-Heravi,et al.  QSAR study of heparanase inhibitors activity using artificial neural networks and Levenberg-Marquardt algorithm. , 2008, European journal of medicinal chemistry.

[16]  Bahram Hemmateenejad,et al.  ANN‐QSAR Model of Drug‐binding to Human Serum Albumin , 2007, Chemical biology & drug design.

[17]  A. Habibi-Yangjeh,et al.  Prediction of basicity constants of various pyridines in aqueous solution using a principal component-genetic algorithm-artificial neural network , 2008 .

[18]  U Depczynski,et al.  Genetic algorithms applied to the selection of factors in principal component regression , 2000 .

[19]  Bjørn K. Alsberg,et al.  A new 3D molecular structure representation using quantum topology with application to structure–property relationships , 2000 .

[20]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[21]  A. Katritzky,et al.  QSPR correlation and predictions of GC retention indexes for methyl-branched hydrocarbons produced by insects. , 2000, Analytical chemistry.

[22]  D. Colbert,et al.  Fullerene nanobutes for molecular electronics , 1999 .

[23]  Ruisheng Zhang,et al.  Radial basis function neural network-based QSPR for the prediction of critical temperature , 2002 .

[24]  S. Olsen,et al.  Limits and prospects of the "incremental approach" and the European legislation on the management of risks related to nanomaterials. , 2007, Regulatory toxicology and pharmacology : RTP.

[25]  P. Scharff,et al.  Biophysical studies of fullerene-based composite for bio-nanotechnology , 2003 .

[26]  Eslam Pourbasheer,et al.  Application of genetic algorithm-support vector machine (GA-SVM) for prediction of BK-channels activity. , 2009, European journal of medicinal chemistry.

[27]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[28]  Biao Chen,et al.  A QSPR treatment for the thermal stabilities of second-order NLO chromophore molecules , 2005, Journal of molecular modeling.

[29]  Comparative study of different structural descriptors and variable selection approaches using partial least squares in quantitative structure-activity relationships , 1992 .

[30]  Eslam Pourbasheer,et al.  Support Vector Machine‐Based Quantitative Structure–Activity Relationship Study of Cholesteryl Ester Transfer Protein Inhibitors , 2009, Chemical biology & drug design.

[31]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[32]  Roberto Todeschini,et al.  Structure/Response Correlations and Similarity/Diversity Analysis by GETAWAY Descriptors, 1. Theory of the Novel 3D Molecular Descriptors , 2002, J. Chem. Inf. Comput. Sci..

[33]  A. Habibi-Yangjeh,et al.  Prediction of Melting Point for Drug-like Compounds Using Principal Component-Genetic Algorithm-Artificial Neural Network , 2008 .

[34]  Qi-jin Zhang,et al.  A Linear QSPR Model for Prediction of Maximum Absorption Wavelength of Second‐order NLO Chromophores , 2006 .

[35]  M. Ganjali,et al.  Exploring QSARs for Antiviral Activity of 4‐Alkylamino‐6‐(2‐hydroxyethyl)‐2‐methylthiopyrimidines by Support Vector Machine , 2008, Chemical biology & drug design.

[36]  J. Leszczynski,et al.  QSPR study on solubility of fullerene C60 in organic solvents using optimal descriptors calculated with SMILES , 2007 .

[37]  Eslam Pourbasheer,et al.  QSRR Study of GC Retention Indices of Essential-Oil Compounds by Multiple Linear Regression with a Genetic Algorithm , 2008 .

[38]  Lourdes Santana,et al.  On the applicability of QSAR for recognition of miRNA bioorganic structures at early stages of organism and cell development: embryo and stem cells. , 2007, Bioorganic & medicinal chemistry.

[39]  Dana Martin,et al.  QSPR modeling of solubility of polyaromatic hydrocarbons and fullerene in 1-octanol and n-heptane. , 2007, The journal of physical chemistry. B.

[40]  Zhide Hu,et al.  A novel quantitative structure-activity relationship method to predict the affinities of MT3 melatonin binding site. , 2008, European journal of medicinal chemistry.

[41]  Corwin Hansch,et al.  Comprehensive medicinal chemistry : the rational design, mechanistic study & therapeutic application of chemical compounds , 1990 .

[42]  Eslam Pourbasheer,et al.  QSAR study on melanocortin-4 receptors by support vector machine. , 2010, European journal of medicinal chemistry.

[43]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .