Towards better understanding of feature-selection or reduction techniques for Quantitative Structure–Activity Relationship models

Abstract A Quantitative Structure–Activity Relationship (QSAR) is a linear or non-linear model, which relates variations in molecular descriptors to variations in the biological activity of a series of active and/or inactive molecules. For this article, different feature-selection or reduction methods were all coupled with Partial Least Squares (PLS) modeling during the selection of features. A PLS model was also built with the entire set of molecular descriptors and was used as a reference to check the reliability and the performance of the different feature-selection methods. To evaluate the ability of the different feature-selection methods, they were performed on two data sets.

[1]  R. Leardi Genetic algorithms in chemometrics and chemistry: a review , 2001 .

[2]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[3]  I. Ikegaki,et al.  Wide therapeutic time window for fasudil neuroprotection against ischemia-induced delayed neuronal death in gerbils , 2007, Brain Research.

[4]  M. Araie,et al.  Intraocular pressure-lowering effects and safety of topical administration of a selective ROCK inhibitor, SNJ-1656, in healthy volunteers. , 2008, Archives of ophthalmology.

[5]  Alejandro C. Olivieri,et al.  A new family of genetic algorithms for wavelength interval selection in multivariate analytical spectroscopy , 2003 .

[6]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[7]  J. Kira,et al.  The selective Rho-kinase inhibitor Fasudil is protective and therapeutic in experimental autoimmune encephalomyelitis , 2006, Journal of Neuroimmunology.

[8]  C. Boschetti,et al.  A New Genetic Algorithm Applied to the near Infrared Analysis of Gasolines , 2004 .

[9]  M. Hori,et al.  Y-27632 prevents tubulointerstitial fibrosis in mouse kidneys with unilateral ureteral obstruction. , 2002, Kidney international.

[10]  J. Roger,et al.  CovSel: Variable selection for highly multivariate and multi-response calibration: Application to IR spectroscopy , 2011 .

[11]  Eduardo A. Castro,et al.  New Hybrid Genetic Based Support Vector Regression as QSAR Approach for Analyzing Flavonoids-GABA(A) Complexes , 2009, J. Chem. Inf. Model..

[12]  Bieke Dejaegher,et al.  Feature selection methods in QSAR studies. , 2012, Journal of AOAC International.

[13]  L. Brás,et al.  A bootstrap‐based strategy for spectral interval selection in PLS regression , 2008 .

[14]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[15]  Gary Tresadern,et al.  A comparison of ligand based virtual screening methods and application to corticotropin releasing factor 1 receptor. , 2009, Journal of molecular graphics & modelling.

[16]  K. Baumann,et al.  A systematic evaluation of the benefits and hazards of variable selection in latent variable regression. Part I. Search algorithm, theory and simulations , 2002 .

[17]  K. Sunagawa,et al.  Rho-Kinase Inhibitor Improves Increased Vascular Resistance and Impaired Vasodilation of the Forearm in Patients With Heart Failure , 2005, Circulation.

[18]  Huaiqing Wang,et al.  A discretization algorithm based on a heterogeneity criterion , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Julio Caballero,et al.  3D-QSAR (CoMFA and CoMSIA) and pharmacophore (GALAHAD) studies on the differential inhibition of aldose reductase by flavonoid compounds. , 2010, Journal of molecular graphics & modelling.

[20]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[21]  Alejandro C. Olivieri,et al.  Visible/near infrared-partial least-squares analysis of Brix in sugar cane juice: A test field for variable selection methods , 2010 .

[22]  Dennis Lee,et al.  Rho kinase as potential therapeutic target for cardiovascular diseases: opportunities and challenges , 2005, Expert opinion on therapeutic targets.

[23]  R. Teófilo,et al.  Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression , 2009 .

[24]  Anne J. Ridley,et al.  ROCKs: multifunctional kinases in cell behaviour , 2003, Nature Reviews Molecular Cell Biology.

[25]  Maykel Pérez González,et al.  Quantitative structure-activity relationship to predict differential inhibition of aldose reductase by flavonoid compounds. , 2005, Bioorganic & medicinal chemistry.

[26]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[27]  Holger R. Maier,et al.  Non-linear variable selection for artificial neural networks using partial mutual information , 2008, Environ. Model. Softw..

[28]  Alejandro C. Olivieri,et al.  Wavelength Selection for Multivariate Calibration Using a Genetic Algorithm: A Novel Initialization Strategy , 2002, J. Chem. Inf. Comput. Sci..

[29]  Shuh Narumiya,et al.  Calcium sensitization of smooth muscle mediated by a Rho-associated protein kinase in hypertension , 1997, Nature.

[30]  J. Topliss,et al.  Chance correlations in structure-activity studies using multiple regression analysis , 1972 .

[31]  Nasser Goudarzi,et al.  Application of successive projections algorithm (SPA) as a variable selection in a QSPR study to predict the octanol/water partition coefficients (Kow) of some halogenated organic compounds , 2010 .

[32]  T. Schroeter,et al.  Rho-kinase inhibitors as therapeutics: from pan inhibition to isoform selectivity , 2009, Cellular and Molecular Life Sciences.

[33]  E. Castro,et al.  QSPR Modeling of Heats of Combustion for Carboxylic Acids , 2007 .

[34]  J. Hunger,et al.  Optimization and analysis of force field parameters by combination of genetic algorithms and neural networks , 1999 .

[35]  M. Inagaki,et al.  Design and synthesis of Rho kinase inhibitors (I). , 2004, Bioorganic & medicinal chemistry.

[36]  Vincent Baeten,et al.  A Backward Variable Selection method for PLS regression (BVSPLS). , 2009, Analytica chimica acta.

[37]  Huanxiang Liu,et al.  Molecular modeling studies of Rho kinase inhibitors using molecular docking and 3D-QSAR analysis. , 2010, European journal of medicinal chemistry.

[38]  W. Hellstrom,et al.  RhoA/Rho-kinase suppresses endothelial nitric oxide synthase in the penis: A mechanism for diabetes-associated erectile dysfunction , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  M. Yano,et al.  Enhancement of Rho/Rho-kinase system in regulation of vascular smooth muscle contraction in tachycardia-induced heart failure. , 2001, Cardiovascular research.

[40]  Beata Walczak,et al.  Selection and weighting of samples in multivariate regression model updating , 2005 .

[41]  Igor V. Tetko,et al.  Neural network studies, 1. Comparison of overfitting and overtraining , 1995, J. Chem. Inf. Comput. Sci..

[42]  Vicki L. Nienaber,et al.  Discovering novel ligands for macromolecules using X-ray crystallographic screening , 2000, Nature Biotechnology.

[43]  K. Kaibuchi,et al.  Rho-Rho-kinase pathway in smooth muscle contraction and cytoskeletal reorganization of non-muscle cells. , 2001, Trends in pharmacological sciences.

[44]  T. Halgren MMFF VI. MMFF94s option for energy minimization studies , 1999, J. Comput. Chem..

[45]  Johann Gasteiger,et al.  Prediction of 1H NMR chemical shifts using neural networks. , 2002, Analytical chemistry.

[46]  P. Lograsso,et al.  Benzimidazole- and benzoxazole-based inhibitors of Rho kinase. , 2008, Bioorganic & medicinal chemistry letters.

[47]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[48]  T. Šolmajer,et al.  QSAR of flavonoids: 4. Differential inhibition of aldose reductase and p561ck protein tyrosine kinase , 2002 .

[49]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[50]  Richard Jensen,et al.  Ant colony optimization as a feature selection method in the QSAR modeling of anti-HIV-1 activities of 3-(3,5-dimethylbenzyl)uracil derivatives using MLR, PLS and SVM regressions , 2009 .

[51]  Maykel Pérez González,et al.  Application of the replacement method as a novel variable selection strategy in QSAR. 1. Carcinogenic potential , 2006 .

[52]  Hiroaki Shimokawa,et al.  Rho-kinase inhibition with intracoronary fasudil prevents myocardial ischemia in patients with coronary microvascular spasm. , 2003, Journal of the American College of Cardiology.

[53]  Chris L. Waller,et al.  Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure-Activity Relationship Studies , 1999, J. Chem. Inf. Comput. Sci..

[54]  Kunal Roy,et al.  Exploring 2D and 3D QSARs of 2,4-diphenyl-1,3-oxazolines for ovicidal activity against Tetranychus urticae , 2009 .

[55]  Y Vander Heyden,et al.  Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study. , 2008, Analytica chimica acta.

[56]  T. Oka,et al.  Reduction of intraocular pressure by topical administration of an inhibitor of the Rho-associated protein kinase , 2001, Current eye research.

[57]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[58]  M A Arnold,et al.  Genetic algorithm-based wavelength selection for the near-infrared determination of glucose in biological matrixes: initialization strategies and effects of spectral resolution. , 1998, Analytical chemistry.

[59]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[60]  W. Tong,et al.  Quantitative structure‐activity relationship methods: Perspectives on drug discovery and toxicology , 2003, Environmental toxicology and chemistry.

[61]  A. Takeshita,et al.  Important Role of Rho-kinase in the Pathogenesis of Cardiovascular Inflammation and Remodeling Induced by Long-Term Blockade of Nitric Oxide Synthesis in Rats , 2002, Hypertension.

[62]  M. Karelson Molecular descriptors in QSAR/QSPR , 2000 .

[63]  Svante Wold,et al.  Multivariate quantitative structure-activity relationships (QSAR): conditions for their applicability , 1983, J. Chem. Inf. Comput. Sci..

[64]  K. Nakao,et al.  ROCK‐I and ROCK‐II, two isoforms of Rho‐associated coiled‐coil forming protein serine/threonine kinase in mice , 1996, FEBS letters.

[65]  D L Massart,et al.  Classification of drugs in absorption classes using the classification and regression trees (CART) methodology. , 2005, Journal of pharmaceutical and biomedical analysis.

[66]  Ekaterina Gordeeva,et al.  Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research , 1993, J. Chem. Inf. Comput. Sci..

[67]  Shuh Narumiya,et al.  An essential part for Rho–associated kinase in the transcellular invasion of tumor cells , 1999, Nature Medicine.

[68]  B. Alicke,et al.  The Rho kinase inhibitor fasudil inhibits tumor progression in human and rat tumor models , 2006, Molecular Cancer Therapeutics.