Machine Learning Estimation of Atom Condensed Fukui Functions

To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre‐calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley‐Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93–94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R2=0.68–0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55–61 % and specificity of 94–95 %.

[1]  João Aires-de-Sousa,et al.  Structure-Based Predictions of 1H NMR Chemical Shifts Using Feed-Forward Neural Networks , 2004, J. Chem. Inf. Model..

[2]  Weitao Yang,et al.  The use of global and local molecular parameters for the analysis of the gas-phase basicity of amines. , 1986, Journal of the American Chemical Society.

[3]  David Firth,et al.  Bradley-Terry Models in R: The BradleyTerry2 Package , 2012 .

[4]  P. Chattaraj,et al.  Update 2 of: electrophilicity index. , 2011, Chemical reviews.

[5]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[6]  Mitchell A. Avery,et al.  Identification of Novel Malarial Cysteine Protease Inhibitors Using Structure-Based Virtual Screening of a Focused Cysteine Protease Inhibitor Library , 2011, J. Chem. Inf. Model..

[7]  Goedele Roos,et al.  Enzymatic catalysis: the emerging role of conceptual density functional theory. , 2009, The journal of physical chemistry. B.

[8]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[9]  Karsten W. Jacobsen,et al.  An object-oriented scripting interface to a legacy electronic structure code , 2002, Comput. Sci. Eng..

[10]  Robert G. Parr,et al.  Density functional approach to the frontier-electron theory of chemical reactivity , 1984 .

[11]  Tanfeng Zhao,et al.  A QSPR approach for the fast estimation of DFT/NBO partial atomic charges☆ , 2014 .

[12]  T. Frauenheim,et al.  DFTB+, a sparse matrix-based implementation of the DFTB method. , 2007, The journal of physical chemistry. A.

[13]  Shubin Liu,et al.  Toward understanding metal-binding specificity of porphyrin: a conceptual density functional theory study. , 2009, The journal of physical chemistry. B.

[14]  Kenichi Fukui,et al.  A Molecular Orbital Theory of Reactivity in Aromatic Hydrocarbons , 1952 .

[15]  Michael Gaus,et al.  DFTB3: Extension of the self-consistent-charge density-functional tight-binding method (SCC-DFTB). , 2011, Journal of chemical theory and computation.

[16]  I. Danaee,et al.  Electrochemical and Theoretical Studies of Adsorption and Corrosion Inhibition of N,N′-Bis(2-hydroxyethoxyacetophenone)-2,2-dimethyl-1,2-propanediimine on Low Carbon Steel (API 5L Grade B) in Acidic Solution , 2013 .

[17]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[18]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[19]  I. Obot,et al.  Anticorrosion Potential of 2-Mesityl-1H-imidazo[4,5-f][1,10]-phenanthroline on Mild Steel in Sulfuric Acid Solution: Experimental and Theoretical Study , 2011 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.