A novel descriptor based on atom-pair properties

BackgroundMolecular descriptors have been widely used to predict biological activities and physicochemical properties or to analyze chemical libraries on the basis of similarity. Although fingerprints and properties are generally used as descriptors, neither is perfect for these purposes. A fingerprint can distinguish between molecules, whereas a property may not do the same in certain cases, and vice versa. When the number of the training set is especially small, the construction of good predictive models is difficult. Herein, a novel descriptor integrating mutually compensating fingerprint and property characteristics is described. The format of this descriptor is not conventional. It has two dimensions with variable length in one dimension to represent one molecule. This format is not acceptable for any machine learning methods. Therefore the distance between molecules has been newly defined for application to machine learning techniques. The evaluation of this descriptor, as applied to classification tasks, was performed using a support vector machine after the features of the descriptor had been optimized by a genetic algorithm.ResultsBecause the optimizing feature is time-intensive due to the complicated calculation of distances between molecules, the optimization was forced to stop before it was completed. As a result, no remarkable improvement was observed in the classification results for the new descriptor compared with those for other descriptors in any evaluation set used in this work. However, extremely low accuracies were also not found for any set.ConclusionsThe novel descriptor proposed in this work can potentially be used to make highly accurate predictive models. This new concept in descriptors is expected to be useful for developing novel predictive methods with quick training and high accuracy.

[1]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[2]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[3]  L. Hall,et al.  The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure‐Property Modeling , 2007 .

[4]  M. Shahlaei Descriptor selection methods in quantitative structure-activity relationship studies: a review study. , 2013, Chemical reviews.

[5]  John G. Topliss,et al.  QSAR Model for Drug Human Oral Bioavailability1 , 2000 .

[6]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[7]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[8]  Tan Yee Fan,et al.  A Tutorial on Support Vector Machine , 2009 .

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Tomoyuki Higuchi,et al.  Atom Environment Kernels on Molecules , 2014, J. Chem. Inf. Model..

[11]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[12]  Renu Vyas,et al.  A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules. , 2015, Combinatorial chemistry & high throughput screening.

[13]  Shan Suthaharan,et al.  Decision Tree Learning , 2016 .

[14]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[15]  Hannu Toivonen,et al.  Statistical evaluation of the predictive toxicology challenge , 2000 .

[16]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[17]  Renu Vyas,et al.  Machine Learning Methods in Chemoinformatics for Drug Discovery , 2014 .

[18]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[19]  M. Karelson,et al.  Quantum-Chemical Descriptors in QSAR/QSPR Studies. , 1996, Chemical reviews.

[20]  Sagarika Sahoo,et al.  A Short Review of the Generation of Molecular Descriptors and Their Applications in Quantitative Structure Property/Activity Relationships. , 2016, Current computer-aided drug design.

[21]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[22]  Jeffrey J. Sutherland,et al.  Spline-Fitting with a Genetic Algorithm: A Method for Developing Classification Structure-Activity Relationships , 2003, J. Chem. Inf. Comput. Sci..

[23]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[24]  Jean-Philippe Vert,et al.  Graph kernels based on tree patterns for molecules , 2006, Machine Learning.

[25]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[26]  Andreas Zell,et al.  Optimal assignment kernels for attributed molecular graphs , 2005, ICML.

[27]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[28]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[29]  M. C. Newman,et al.  The practice of structure activity relationships (SAR) in toxicology. , 2000, Toxicological sciences : an official journal of the Society of Toxicology.

[30]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[31]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[32]  P. N. Craig,et al.  QSAR—Origins and Present Status: A Historical Perspective , 1984 .

[33]  L Xue,et al.  Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. , 2000, Combinatorial chemistry & high throughput screening.

[34]  Zhi-Wei Cao,et al.  Effect of Selection of Molecular Descriptors on the Prediction of Blood-Brain Barrier Penetrating and Nonpenetrating Agents by Statistical Learning Methods , 2005, J. Chem. Inf. Model..

[35]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[36]  M. Karelson,et al.  QSPR: the correlation and quantitative prediction of chemical and physical properties from structure , 1995 .

[37]  J. Topliss,et al.  QSAR model for drug human oral bioavailability. , 2000, Journal of medicinal chemistry.

[38]  Ashwin Srinivasan,et al.  Statistical Evaluation of the Predictive Toxicology Challenge 2000-2001 , 2003, Bioinform..