The application of Kriging and empirical Kriging based on the variables selected by SCAD.

The commonly used approach for building a structure-activity/property relationship consists of three steps. First, one determines the descriptors for the molecular structure, then builds a metamodel by using some proper mathematical methods, and finally evaluates the meta-model. Some existing methods only can select important variables from the candidates, while most metamodels just explore linear relationships between inputs and outputs. Some techniques are useful to build more complicated relationship, but they may not be able to select important variables from a large number of variables. In this paper, we propose to screen important variables by the smoothly clipped absolute deviation (SCAD) variable selection procedure, and then apply Kriging model and empirical Kriging model for quantitative structure-activity/property relationship (QSAR/QSPR) research based on the selected important variables. We demonstrate the proposed procedure retains the virtues of both variable selection and Kriging model.

[1]  Gerta Rücker,et al.  Counts of all walks as atomic and molecular descriptors , 1993, J. Chem. Inf. Comput. Sci..

[2]  Harry P. Schultz,et al.  Topological organic chemistry. 1. Graph theory and topological indices of alkanes , 1989, J. Chem. Inf. Comput. Sci..

[3]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[4]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[5]  Lemont B. Kier,et al.  An Electrotopological-State Index for Atoms in Molecules , 1990, Pharmaceutical Research.

[6]  Lemont B. Kier,et al.  A Shape Index from Molecular Graphs , 1985 .

[7]  M. Randic Characterization of molecular branching , 1975 .

[8]  J. Friedman Multivariate adaptive regression splines , 1990 .

[9]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[10]  Milan Randic,et al.  On Canonical Numbering of Atoms in a Molecule and Graph Isomorphism , 1977, J. Chem. Inf. Comput. Sci..

[11]  Milan Randic,et al.  Search for all self-avoiding paths graphs for molecular graphs , 1979, Comput. Chem..

[12]  István Lukovits,et al.  On the Definition of the Hyper-Wiener Index for Cycle-Containing Structures , 1995, J. Chem. Inf. Comput. Sci..

[13]  Runze Li,et al.  Design and Modeling for Computer Experiments , 2005 .

[14]  L. Hall,et al.  Molecular connectivity in chemistry and drug research , 1976 .

[15]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[16]  Gerta Rücker,et al.  On Topological Indices, Boiling Points, and Cycloalkanes , 1999, J. Chem. Inf. Comput. Sci..

[17]  L. Kier Shape Indexes of Orders One and Three from Molecular Graphs , 1986 .

[18]  Yizeng Liang,et al.  Variable selection via nonconcave penalty function in structure–boiling points correlations , 2005 .

[19]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[20]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[21]  Milan Randic,et al.  Novel Shape Descriptors for Molecular Graphs , 2001, J. Chem. Inf. Comput. Sci..

[22]  N. Trinajstic,et al.  Information theory, distance matrix, and molecular branching , 1977 .

[23]  Yi-Zeng Liang,et al.  New Approach by Kriging Models to Problems in QSAR , 2004, J. Chem. Inf. Model..

[24]  H. Hosoya Topological Index. A Newly Proposed Quantity Characterizing the Topological Nature of Structural Isomers of Saturated Hydrocarbons , 1971 .

[25]  N. Trinajstic,et al.  On the Harary index for the characterization of chemical graphs , 1993 .

[26]  Dong Yun,et al.  Data Mining for Seeking an Accurate Quantitative Relationship between Molecular Structure and GC Retention Indices of Alkenes by Projection Pursuit , 2002, J. Chem. Inf. Comput. Sci..

[27]  Peter C. Jurs,et al.  Prediction of gas chromatographic retention indexes of selected olefins , 1985 .

[28]  Alexandru T. Balaban,et al.  A new approach for devising local graph invariants: Derived topological indices with low degeneracy and good correlation ability , 1987 .