A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives

Abstract A high-dimensional quantitative structure–activity relationship (QSAR) classification model typically contains a large number of irrelevant and redundant descriptors. In this paper, a new design of descriptor selection for the QSAR classification model estimation method is proposed by adding a new weight inside L1-norm. The experimental results of classifying the anti-hepatitis C virus activity of thiourea derivatives demonstrate that the proposed descriptor selection method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance on both the training and the testing datasets. Moreover, it is noteworthy that the results obtained in terms of stability test and applicability domain provide a robust QSAR classification model. It is evident from the results that the developed QSAR classification model could conceivably be employed for further high-dimensional QSAR classification studies.

[1]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[2]  A. Yueh,et al.  Design and efficient synthesis of novel arylthiourea derivatives as potent hepatitis C virus inhibitors. , 2009, Bioorganic & medicinal chemistry letters.

[3]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[4]  Rasmus Bro,et al.  A tutorial on the Lasso approach to sparse modeling , 2012 .

[5]  Pablo R Duchowicz,et al.  A comparative QSAR on 1,2,5-thiadiazolidin-3-one 1,1-dioxide compounds as selective inhibitors of human serine proteinases. , 2011, Journal of molecular graphics & modelling.

[6]  Muhammad Hisyam Lee,et al.  Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification , 2015, Comput. Biol. Medicine.

[7]  Xiaohui Fan,et al.  Reliably assessing prediction reliability for high dimensional QSAR data , 2012, Molecular Diversity.

[8]  Eslam Pourbasheer,et al.  2D and 3D Quantitative Structure-Activity Relationship Study of Hepatitis C Virus NS5B Polymerase Inhibitors by Comparative Molecular Field Analysis and Comparative Molecular Similarity Indices Analysis Methods , 2014, J. Chem. Inf. Model..

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Hasmerya Maarof,et al.  Quantitative structure–activity relationship model for prediction study of corrosion inhibition efficiency using two‐stage sparse multiple linear regression , 2016 .

[11]  Muhammad Hisyam Lee,et al.  Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification , 2015, Expert Syst. Appl..

[12]  Muhammad Hisyam Lee,et al.  High‐dimensional quantitative structure–activity relationship modeling of influenza neuraminidase a/PR/8/34 (H1N1) inhibitors based on a two‐stage adaptive penalized rank regression , 2016 .

[13]  G. Tian,et al.  Statistical Applications in Genetics and Molecular Biology Sparse Logistic Regression with Lp Penalty for Biomarker Identification , 2011 .

[14]  A. Yueh,et al.  Synthesis, activity, and pharmacokinetic properties of a series of conformationally-restricted thiourea analogs as novel hepatitis C virus inhibitors. , 2010, Bioorganic & medicinal chemistry.

[15]  Ludwig Lausser,et al.  Measuring and visualizing the stability of biomarker selection techniques , 2011, Computational Statistics.

[16]  C. Braak,et al.  Regression by L1 regularization of smart contrasts and sums (ROSCAS) beats PLS and elastic net in latent variable model , 2009 .

[17]  Jim Euchner Design , 2014, Catalysis from A to Z.

[18]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[19]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[20]  Viney Lather,et al.  Diverse classification models for anti-hepatitis C virus activity of thiourea derivatives , 2015 .

[21]  Shuichi Kawano,et al.  Selection of tuning parameters in bridge regression models via Bayesian information criterion , 2012, Statistical Papers.

[22]  Dries F. Benoit,et al.  Bayesian adaptive Lasso quantile regression , 2012 .

[23]  Y. Chao,et al.  Design, synthesis, and anti-HCV activity of thiourea compounds. , 2009, Bioorganic & medicinal chemistry letters.

[24]  N Basant,et al.  Qualitative and quantitative structure–activity relationship modelling for predicting blood-brain barrier permeability of structurally diverse chemicals , 2015, SAR and QSAR in environmental research.

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  P. Gramatica,et al.  QSAR classification models for the screening of the endocrine-disrupting activity of perfluorinated compounds , 2012, SAR and QSAR in environmental research.

[27]  Concha Bielza,et al.  Regularized logistic regression without a penalty term: An application to cancer classification with microarray data , 2011, Expert Syst. Appl..

[28]  Z. Algamal,et al.  High-dimensional QSAR modelling using penalized linear regression model with L1/2-norm , 2016, SAR and QSAR in environmental research.

[29]  Belén Melián-Batista,et al.  High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach , 2016, Inf. Sci..

[30]  Zakariya Yahya Algamal,et al.  High‐dimensional QSAR prediction of anticancer potency of imidazo[4,5‐b]pyridine derivatives using adjusted adaptive LASSO , 2015 .

[31]  QSAR based modeling of hepatitis C virus NS5B inhibitors , 2011 .

[32]  Apilak Worachartcheewan,et al.  Predictive QSAR modeling of aldose reductase inhibitors using Monte Carlo feature selection. , 2014, European journal of medicinal chemistry.

[33]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[34]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[35]  P. Wakeley,et al.  Synthesis , 2013, The Role of Animals in Emerging Viral Diseases.

[36]  Yasin Asar,et al.  New Shrinkage Parameters for the Liu-type Logistic Estimators , 2016, Commun. Stat. Simul. Comput..

[37]  Yingmin Jia,et al.  Partly adaptive elastic net and its application to microarray classification , 2012, Neural Computing and Applications.

[38]  Zakariya Yahya Algamal,et al.  High Dimensional QSAR Study of Mild Steel Corrosion Inhibition in acidic medium by Furan Derivatives , 2015, International Journal of Electrochemical Science.

[39]  Peter Filzmoser,et al.  Review of sparse methods in regression and classification with application to chemometrics , 2012 .