A QSAR classification model for neuraminidase inhibitors of influenza A viruses (H1N1) based on weighted penalized support vector machine

Abstract Descriptor selection is a procedure widely used in chemometrics. The aim is to select the best subset of descriptors relevant to the quantitative structure–activity relationship (QSAR) study being considered. In this paper, a new descriptor selection method for the QSAR classification model is proposed by adding a new weight inside L1-norm. The experimental results from classifying the neuraminidase inhibitors of influenza A viruses (H1N1) demonstrate that the proposed method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance and the number of selected descriptors.

[1]  Ke Zhang,et al.  Analysis of High-Dimensional Structure-Activity Screening Datasets Using the Optimal Bit String Tree , 2013, Technometrics.

[2]  Changyi Park,et al.  Oracle properties of SCAD-penalized support vector machine , 2012 .

[3]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[4]  Muhammad Hisyam Lee,et al.  Applying Penalized Binary Logistic Regression with Correlation Based Elastic Net for Variables Selection , 2015 .

[5]  Hasmerya Maarof,et al.  Quantitative structure–activity relationship model for prediction study of corrosion inhibition efficiency using two‐stage sparse multiple linear regression , 2016 .

[6]  J. Rollinger,et al.  Antiviral potential and molecular insight into neuraminidase inhibiting diarylheptanoids from Alpinia katsumadai. , 2010, Journal of medicinal chemistry.

[7]  Zakariya Yahya Algamal,et al.  High Dimensional Logistic Regression Model using Adjusted Elastic Net Penalty , 2015 .

[8]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[9]  H. Mei,et al.  Docking and 3D-QSAR studies of influenza neuraminidase inhibitors using three-dimensional holographic vector of atomic interaction field analysis. , 2010, European journal of medicinal chemistry.

[10]  Zakariya Yahya Algamal,et al.  High‐dimensional QSAR classification model for anti‐hepatitis C virus activity of thiourea derivatives based on the sparse logistic regression model with a bridge penalty , 2017 .

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Ali Dehghani,et al.  Syntheses and neuraminidase inhibitory activity of multisubstituted cyclopentane amide derivatives. , 2004, Journal of medicinal chemistry.

[13]  Shili Lin,et al.  Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification , 2010, TCBB.

[14]  Z Y Algamal,et al.  A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives , 2017, SAR and QSAR in environmental research.

[15]  Aixia Yan,et al.  Using Support Vector Machine (SVM) for Classification of Selectivity of H1N1 Neuraminidase Inhibitors , 2016, Molecular informatics.

[16]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[17]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[18]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[19]  H. Si,et al.  Quantitative structure–activity relationship study on antitumour activity of a series of flavonoids , 2012 .

[20]  Kazushi Ikeda,et al.  Geometrical Properties of Nu Support Vector Machines with Different Norms , 2005, Neural Computation.

[21]  W. Oh,et al.  C-Methylated Flavonoids from Cleistocalyx operculatus and Their Inhibitory Effects on Novel Influenza A (H1N1) Neuraminidase. , 2010, Journal of natural products.

[22]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[23]  Hong Yan,et al.  An accurate nonlinear QSAR model for the antitumor activities of chloroethylnitrosoureas using neural networks. , 2011, Journal of molecular graphics & modelling.

[24]  Shutao Li,et al.  Gene Selection Using Wilcoxon Rank Sum Test and Support Vector Machine for Cancer Classification , 2007, CIS.

[25]  M. Novič,et al.  Assessment of applicability domain for multivariate counter-propagation artificial neural network predictive models by minimum euclidean distance space analysis: a case study. , 2013, Analytica chimica acta.

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Eslam Pourbasheer,et al.  2D and 3D Quantitative Structure-Activity Relationship Study of Hepatitis C Virus NS5B Polymerase Inhibitors by Comparative Molecular Field Analysis and Comparative Molecular Similarity Indices Analysis Methods , 2014, J. Chem. Inf. Model..

[28]  Zakariya Yahya Algamal,et al.  High Dimensional QSAR Study of Mild Steel Corrosion Inhibition in acidic medium by Furan Derivatives , 2015, International Journal of Electrochemical Science.

[29]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[30]  Julio López,et al.  Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty , 2017, Applied Intelligence.

[31]  Axel Benner,et al.  penalizedSVM: a R-package for feature selection SVM classification , 2009, Bioinform..

[32]  Gokmen Zararsiz,et al.  Drug/nondrug classification using Support Vector Machines with various feature selection strategies , 2014, Comput. Methods Programs Biomed..

[33]  Muhammad Hisyam Lee,et al.  Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification , 2015, Comput. Biol. Medicine.

[34]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[35]  Hao Helen Zhang,et al.  Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[36]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[37]  W G Laver,et al.  Influenza neuraminidase inhibitors possessing a novel hydrophobic interaction in the enzyme active site: design, synthesis, and structural analysis of carbocyclic sialic acid analogues with potent anti-influenza activity. , 1997, Journal of the American Chemical Society.

[38]  Ying Xue,et al.  Quantitative structure–activity relationship study of influenza virus neuraminidase A/PR/8/34 (H1N1) inhibitors by genetic algorithm feature selection and support vector regression , 2013 .

[39]  Irene Luque Ruiz,et al.  QSAR model based on weighted MCS trees approach for the representation of molecule data sets , 2013, Journal of Computer-Aided Molecular Design.

[40]  Zakariya Yahya Algamal,et al.  High‐dimensional QSAR prediction of anticancer potency of imidazo[4,5‐b]pyridine derivatives using adjusted adaptive LASSO , 2015 .

[41]  D. Rotella Influenza neuraminidase inhibitors possessing a novel hydrophobic interaction in the enzyme active site: design, synthesis, and structural analysis of carbocyclic sialic acid analogues with potent anti-influenza activity , 1997 .

[42]  Muhammad Hisyam Lee,et al.  High‐dimensional quantitative structure–activity relationship modeling of influenza neuraminidase a/PR/8/34 (H1N1) inhibitors based on a two‐stage adaptive penalized rank regression , 2016 .

[43]  Satoru Miyano,et al.  A Novel Adaptive Penalized Logistic Regression for Uncovering Biomarker Associated with Anti-Cancer Drug Sensitivity , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[44]  W G Laver,et al.  Structure-activity relationship studies of novel carbocyclic influenza neuraminidase inhibitors. , 1998, Journal of medicinal chemistry.

[45]  Uko Maran,et al.  QSAR DataBank - an approach for the digital organization and archiving of QSAR model information , 2014, Journal of Cheminformatics.

[46]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[47]  Saeid Nahavandi,et al.  Hidden Markov models for cancer classification using gene expression profiles , 2015, Inf. Sci..

[48]  Florentino Fernández Riverola,et al.  A novel ensemble of classifiers that use biological relevant gene sets for microarray classification , 2014, Appl. Soft Comput..

[49]  Roberto Todeschini,et al.  Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions , 2013, Journal of Cheminformatics.

[50]  Muhammad Hisyam Lee,et al.  Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification , 2015, Expert Syst. Appl..

[51]  Zhiyuan Luo,et al.  Gene Selection for Cancer Classification using Wilcoxon Rank Sum Test and Support Vector Machine , 2006, 2006 International Conference on Computational Intelligence and Security.

[52]  Z. Algamal,et al.  High-dimensional QSAR modelling using penalized linear regression model with L1/2-norm , 2016, SAR and QSAR in environmental research.