论文信息 - Machine learning methods in chemoinformatics

Machine learning methods in chemoinformatics

Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods‐based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k‐Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481.

John B. O. Mitchell

[1] Thomas Sander,et al. Toxicity-Indicating Structural Patterns , 2006, J. Chem. Inf. Model..

[2] Robert C. Glen,et al. Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[3] S. A. Salah,et al. Feature extraction and classification of Chilean wines , 2006 .

[4] F. Galton. Vox Populi , 1907, Nature.

[5] Oliver Korb,et al. Efficient ant colony optimization algorithms for structure- and ligand-based drug design , 2009 .

[6] Anatoly G Artemenko,et al. Interpretation of QSAR Models Based on Random Forest Methods , 2011, Molecular informatics.

[7] S C Basak,et al. Predicting mutagenicity of chemicals using topological and quantum chemical parameters: a similarity based study. , 1995, Chemosphere.

[8] Káthia M. Honório,et al. A study on the influence of molecular properties in the psychoactivity of cannabinoid compounds , 2005, Journal of molecular modeling.

[9] Robert P Sheridan,et al. Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[10] Lazaros Mavridis,et al. Predicting the protein targets for athletic performance-enhancing substances , 2013, Journal of Cheminformatics.

[11] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[12] Z. R. Li,et al. Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods. , 2006, Journal of molecular graphics & modelling.

[13] L. Hammett,et al. Reaction Rates and Indicator Acidities. , 1935 .

[14] John B. O. Mitchell,et al. Toxicological relationships between proteins obtained from protein target predictions of large toxicity databases. , 2008, Toxicology and applied pharmacology.

[15] Dirk Neumann,et al. A Fully Computational Model for Predicting Percutaneous Drug Absorption , 2006, J. Chem. Inf. Model..

[16] Z R Li,et al. Quantitative structure-pharmacokinetic relationships for drug clearance by using statistical learning methods. , 2006, Journal of molecular graphics & modelling.

[17] C E Berkoff,et al. Substructural analysis. A novel approach to the problem of drug design. , 1974, Journal of medicinal chemistry.

[18] Alexander Tropsha,et al. k Nearest Neighbors QSAR Modeling as a Variational Problem: Theory and Applications , 2005, J. Chem. Inf. Model..

[19] Scott Boyer,et al. Interpretation of Nonlinear QSAR Models Applied to Ames Mutagenicity Data , 2009, J. Chem. Inf. Model..

[20] Teruki Honma,et al. Combining Machine Learning and Pharmacophore-Based Interaction Fingerprint for in Silico Screening , 2010, J. Chem. Inf. Model..

[21] Leonard E. Trigg,et al. Technical Note: Naive Bayes for Regression , 2000, Machine Learning.

[22] Dong-Sheng Cao,et al. Prediction of aqueous solubility of druglike organic compounds using partial least squares, back‐propagation network and support vector machine , 2010 .

[23] Yan Zhao,et al. Drug repositioning: a machine-learning approach through data integration , 2013, Journal of Cheminformatics.

[24] François Petitet,et al. In Silico Classification of hERG Channel Blockers: a Knowledge‐Based Strategy , 2006, ChemMedChem.

[25] Dmitrij Frishman,et al. Pitfalls of supervised feature selection , 2009, Bioinform..

[26] Pierre Baldi,et al. Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[27] Driss Zakarya,et al. Structure–camphor odour relationships using the Generation and Selection of Pertinent Descriptors approach , 1999 .

[28] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .

[29] John B. O. Mitchell,et al. Can we predict lattice energy from molecular structure? , 2003, Acta Crystallographica Section B Structural Science.

[30] Judith C. Madden,et al. In Silico Prediction of Aqueous Solubility: The Solubility Challenge , 2009, J. Chem. Inf. Model..

[31] Samuel H. Yalkowsky,et al. Prediction of Drug Solubility by the General Solubility Equation (GSE) , 2001, J. Chem. Inf. Comput. Sci..

[32] R. Rosenfeld. Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[33] Lazaros Mavridis,et al. Comprehensive Comparison of Ligand-Based Virtual Screening Tools Against the DUD Data set Reveals Limitations of Current 3D Methods , 2010, J. Chem. Inf. Model..

[34] Chih-Jen Lin,et al. A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[35] M Karplus,et al. Three-dimensional quantitative structure-activity relationships from molecular similarity matrices and genetic neural networks. 2. Applications. , 1997, Journal of medicinal chemistry.

[36] Andreas Bender,et al. Chemoinformatics-Based Classification of Prohibited Substances Employed for Doping in Sport , 2006, J. Chem. Inf. Model..

[37] Stu Borman,et al. New QSAR Techniques Eyed For Environmental Assessments: Expert system, spectroscopy method use readily available data to develop quantitative structure-activity relationships for broad compound classes , 1990 .

[38] Alexander Tropsha,et al. Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[39] Kilian Stoffel,et al. Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[40] P. Khadikar,et al. Prediction of intrinsic solubility of generic drugs using MLR, ANN and SVM analyses. , 2010, European journal of medicinal chemistry.

[41] C. Hansch,et al. p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[42] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[43] David J. Livingstone,et al. Application of QSPR to Mixtures , 2006, J. Chem. Inf. Model..

[44] Sudhir A. Kulkarni,et al. Three-Dimensional QSAR Using the k-Nearest Neighbor Method and Its Interpretation , 2006, J. Chem. Inf. Model..