A New Redundant Variable Pruning Approach - minor Latent Variable Perturbation-PLS Used for QSAR Studies on Anti-HIV Drugs

A new approach for eliminating the redundant variables in the multivariable data matrix encountered in QSAR studies, minor latent variable perturbation (MLVP)-PLS method has been proposed. In the latent variable (LV) space, the minor latent variables (LVs) with small covariances are mainly formulated by linear combinations of the redundant variables including information-deficient and highly correlative ones, while the major LVs with large covariances are mainly contributed by the informative variables. Deleting a minor LV, which is equivalent to a perturbation for LV space, could make the redundant variables not well be represented in LV subspace, leading to strong variation of their PLS regression coefficients. The informative variables could still be normally represented in LV subspace with the PLS regression coefficients remaining relatively stable. MLVP-PLS utilizes this fact to discriminate the informative and redundant variables. It gradually identifies and eliminates the redundant variables according to the relative variation of PLS regression coefficients after perturbations are given. The elimination process is terminated according to some proposed criteria. Applying the method to the quantitative structure-activity relationship (QSAR) studies on TIBO derivatives as potential anti-HIV drugs has demonstrated the feasibility and robustness of the proposed approach. A deeper insight into the effect of different structural parameters on the bio-activity of TIBO derivatives has been reached.

[1]  M. Miranda,et al.  Synthesis and anti-HIV-1 activity of 4,5,6,7-tetrahydro-5-methylimidazo [4,5,1-jk][1,4]benzodiazepin-2(1H)-one (TIBO) derivatives. 3. , 1991, Journal of medicinal chemistry.

[2]  John H. Kalivas,et al.  Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry , 1989 .

[3]  C. B. Lucasius,et al.  Genetic algorithms for large-scale optimization in chemometrics: An application , 1991 .

[4]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[5]  D. Wienke,et al.  Optimal Wavelength Range Selection by a Genetic Algorithm for Discrimination Purposes in Spectroscopic Infrared Imaging , 1997 .

[6]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[7]  C. B. Lucasius,et al.  Genetic algorithms in wavelength selection: a comparative study , 1994 .

[8]  Luis A. Sarabia,et al.  Modelling the relation between CieLab parameters and sensory scores for quality control of red-wine colour , 1995 .

[9]  Paul Geladi,et al.  Random error bias in principal component analysis. Part I. derivation of theoretical predictions , 1995 .

[10]  Riccardo Leardi,et al.  Genetic Algorithms as a Tool for Wavelength Selection in Multivariate Calibration , 1995 .

[11]  C. Hansch,et al.  Comparative Quantitative Structure−Activity Relationship Studies on Anti-HIV Drugs , 1999 .

[12]  R. Leardi Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection , 1994 .

[13]  Liang Yi-Zeng,et al.  Accuracy criteria and optimal wavelength selection for multicomponent spectrophotometric determinations , 1989 .

[14]  D. Massart,et al.  Application of wavelet transform to extract the relevant component from spectral data for multivariate calibration. , 1997, Analytical chemistry.

[15]  Tetsuo Iwata,et al.  Application of the Modified UVE-PLS Method for a Mid-Infrared Absorption Spectral Data Set of Water—Ethanol Mixtures , 2000 .

[16]  D. E. Patterson,et al.  Crossvalidation, Bootstrapping, and Partial Least Squares Compared with Multiple Regression in Conventional QSAR Studies , 1988 .

[17]  Satoshi Kawata,et al.  Optimal Wavelength Selection for Quantitative Analysis , 1986 .

[18]  M A Arnold,et al.  Genetic algorithm-based method for selecting wavelengths and model size for use with partial least-squares regression: application to near-infrared spectroscopy. , 1996, Analytical chemistry.

[19]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[20]  Paul Geladi,et al.  Random error bias in principal component analysis. Part II. Application of theoretical predictions to multivariate problems , 1995 .