Toward robust QSPR models: Synergistic utilization of robust regression and variable elimination

Widely used regression approaches in modeling quantitative structure–property relationships, such as PLS regression, are highly susceptible to outlying observations that will impair the prognostic value of a model. Our aim is to compile homogeneous datasets as the basis for regression modeling by removing outlying compounds and applying variable selection. We investigate different approaches to create robust, outlier‐resistant regression models in the field of prediction of drug molecules' permeability. The objective is to join the strength of outlier detection and variable elimination increasing the predictive power of prognostic regression models. In conclusion, outlier detection is employed to identify multiple, homogeneous data subsets for regression modeling. © 2007 Wiley Periodicals, Inc. J Comput Chem 2008

[1]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[2]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[3]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[4]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[5]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[6]  Fumiyoshi Yamashita,et al.  Quantitative structure/property relationship analysis of Caco-2 permeability using a genetic algorithm-based partial least squares method. , 2002, Journal of pharmaceutical sciences.

[7]  M. Hubert,et al.  A fast method for robust principal components with applications to chemometrics , 2002 .

[8]  A. J. Hopfinger,et al.  Chemometric Methods in Molecular Design. Methods and Principles in Medicinal Chemistry, Volume 2 Edited by Han van de Waterbend (Hoffman-LaRoche Ltd., Basil, Switzerland). VCH: New York. 1995. xix + 359 pp. $110. ISBN 3-527-30044-9. , 1996 .

[9]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[10]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[11]  Ruifeng Liu,et al.  Development of Quantitative Structure-Property Relationship Models for Early ADME Evaluation in Drug Discovery. 2. Blood-Brain Barrier Penetration , 2001, J. Chem. Inf. Comput. Sci..

[12]  Gisbert Schneider,et al.  SVM-Based Feature Selection for Characterization of Focused Compound Collections , 2004, J. Chem. Inf. Model..

[13]  G. Cruciani,et al.  Generating Optimal Linear PLS Estimations (GOLPE): An Advanced Chemometric Tool for Handling 3D‐QSAR Problems , 1993 .

[14]  Jaroslaw Polanski,et al.  The Comparative Molecular Surface Analysis (CoMSA) with Modified Uniformative Variable Elimination-PLS (UVE-PLS) Method: Application to the Steroids Binding the Aromatase Enzyme , 2003, J. Chem. Inf. Comput. Sci..

[15]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[16]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[17]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Enhancement of Comparative Molecular Binding Energy Analysis by GA‐Based PLS Method , 1999 .

[18]  William J Egan,et al.  Prediction of intestinal permeability. , 2002, Advanced drug delivery reviews.

[19]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[20]  F. Lombardo,et al.  Computation of brain-blood partitioning of organic solutes via free energy calculations. , 1996, Journal of medicinal chemistry.

[21]  M. Hubert,et al.  Robust methods for partial least squares regression , 2003 .

[22]  Hxugo Kubiny Variable Selection in QSAR Studies. I. An Evolutionary Algorithm , 1994 .