High‐dimensional QSAR classification model for anti‐hepatitis C virus activity of thiourea derivatives based on the sparse logistic regression model with a bridge penalty

This study addresses the problem of the high‐dimensionality of quantitative structure‐activity relationship (QSAR) classification modeling. A new selection of descriptors that truly affect biological activity and a QSAR classification model estimation method are proposed by combining the sparse logistic regression model with a bridge penalty for classifying the anti‐hepatitis C virus activity of thiourea derivatives. Compared to other commonly used sparse methods, the proposed method shows superior results in terms of classification accuracy and model interpretation.

[1]  Zakariya Yahya Algamal,et al.  High‐dimensional QSAR prediction of anticancer potency of imidazo[4,5‐b]pyridine derivatives using adjusted adaptive LASSO , 2015 .

[2]  QSAR based modeling of hepatitis C virus NS5B inhibitors , 2011 .

[3]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[4]  Lixin Song,et al.  Bridge estimation for generalized linear models with a diverging number of parameters , 2010 .

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  G. Tian,et al.  Statistical Applications in Genetics and Molecular Biology Sparse Logistic Regression with Lp Penalty for Biomarker Identification , 2011 .

[9]  Rasmus Bro,et al.  A tutorial on the Lasso approach to sparse modeling , 2012 .

[10]  Pablo R Duchowicz,et al.  A comparative QSAR on 1,2,5-thiadiazolidin-3-one 1,1-dioxide compounds as selective inhibitors of human serine proteinases. , 2011, Journal of molecular graphics & modelling.

[11]  Muhammad Hisyam Lee,et al.  Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification , 2015, Comput. Biol. Medicine.

[12]  M. Novič,et al.  Assessment of applicability domain for multivariate counter-propagation artificial neural network predictive models by minimum euclidean distance space analysis: a case study. , 2013, Analytica chimica acta.

[13]  Zakariya Yahya Algamal,et al.  High Dimensional QSAR Study of Mild Steel Corrosion Inhibition in acidic medium by Furan Derivatives , 2015, International Journal of Electrochemical Science.

[14]  Y. Chao,et al.  Design, synthesis, and anti-HCV activity of thiourea compounds. , 2009, Bioorganic & medicinal chemistry letters.

[15]  Apilak Worachartcheewan,et al.  Predictive QSAR modeling of aldose reductase inhibitors using Monte Carlo feature selection. , 2014, European journal of medicinal chemistry.

[16]  A. Yueh,et al.  Synthesis, activity, and pharmacokinetic properties of a series of conformationally-restricted thiourea analogs as novel hepatitis C virus inhibitors. , 2010, Bioorganic & medicinal chemistry.

[17]  Hasmerya Maarof,et al.  Quantitative structure–activity relationship model for prediction study of corrosion inhibition efficiency using two‐stage sparse multiple linear regression , 2016 .

[18]  Z. Algamal,et al.  High-dimensional QSAR modelling using penalized linear regression model with L1/2-norm , 2016, SAR and QSAR in environmental research.

[19]  Yasin Asar,et al.  New Shrinkage Parameters for the Liu-type Logistic Estimators , 2016, Commun. Stat. Simul. Comput..

[20]  Yingmin Jia,et al.  Partly adaptive elastic net and its application to microarray classification , 2012, Neural Computing and Applications.

[21]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Concha Bielza,et al.  Regularized logistic regression without a penalty term: An application to cancer classification with microarray data , 2011, Expert Syst. Appl..

[24]  Peter Filzmoser,et al.  Review of sparse methods in regression and classification with application to chemometrics , 2012 .

[25]  Zakariya Yahya Algamal,et al.  High Dimensional Logistic Regression Model using Adjusted Elastic Net Penalty , 2015 .

[26]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[27]  Dries F. Benoit,et al.  Bayesian adaptive Lasso quantile regression , 2012 .

[28]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[29]  A. Yueh,et al.  Design and efficient synthesis of novel arylthiourea derivatives as potent hepatitis C virus inhibitors. , 2009, Bioorganic & medicinal chemistry letters.

[30]  Muhammad Hisyam Lee,et al.  Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification , 2015, Expert Syst. Appl..

[31]  Eslam Pourbasheer,et al.  2D and 3D Quantitative Structure-Activity Relationship Study of Hepatitis C Virus NS5B Polymerase Inhibitors by Comparative Molecular Field Analysis and Comparative Molecular Similarity Indices Analysis Methods , 2014, J. Chem. Inf. Model..

[32]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.

[33]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[34]  Viney Lather,et al.  Diverse classification models for anti-hepatitis C virus activity of thiourea derivatives , 2015 .

[35]  Kwong-Sak Leung,et al.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification , 2013, BMC Bioinformatics.

[36]  C. Braak,et al.  Regression by L1 regularization of smart contrasts and sums (ROSCAS) beats PLS and elastic net in latent variable model , 2009 .

[37]  P. Gramatica,et al.  QSAR classification models for the screening of the endocrine-disrupting activity of perfluorinated compounds , 2012, SAR and QSAR in environmental research.

[38]  Muhammad Hisyam Lee,et al.  High‐dimensional quantitative structure–activity relationship modeling of influenza neuraminidase a/PR/8/34 (H1N1) inhibitors based on a two‐stage adaptive penalized rank regression , 2016 .

[39]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[40]  Cheolwoo Park,et al.  Bridge regression: Adaptivity and group selection , 2011 .

[41]  Xiaohui Fan,et al.  Reliably assessing prediction reliability for high dimensional QSAR data , 2012, Molecular Diversity.

[42]  Shuichi Kawano,et al.  Selection of tuning parameters in bridge regression models via Bayesian information criterion , 2012, Statistical Papers.

[43]  H. H. Kim,et al.  A lazy learning-based QSAR classification study for screening potential histone deacetylase 8 (HDAC8) inhibitors , 2015, SAR and QSAR in environmental research.

[44]  N Basant,et al.  Qualitative and quantitative structure–activity relationship modelling for predicting blood-brain barrier permeability of structurally diverse chemicals , 2015, SAR and QSAR in environmental research.