Feature selection based on graph Laplacian by using compounds with known and unknown activities

A semisupervised feature selection method based on graph Laplacian (S2FSGL) was proposed for quantitative structure‐activity relationship (QSAR) models, which uses an ℓ2,1‐norm and compounds with both known and unknown activities. In the proposed S2FSGL method, 2 graphs Gunsup and Gsup are constructed. It uses the label information of compounds with known activities and the local structure of compounds with known and unknown activities to select the most important descriptors. The weight matrix of graph Gunsup models the local structure of the compounds with known and unknown activities. The S2FSGL method uses the ℓ2,1‐norm to consider the correlation between different descriptors when conducting descriptor selection. The performance of the proposed S2FSGL coupled with a kernel smoother model was evaluated using 2 QSAR data sets and compared with the performance of other feature selection methods. For the evaluation of the performance of QSAR models and selected descriptors, several different training and test sets were produced for each data set. The comparison between the statistical parameters of QSAR models built based on the semisupervised feature selection method and those obtained by other feature selection methods revealed the superiority of the proposed S2FSGL in selecting the most relevant descriptors. The results showed that the use of compounds with unknown activities beside compounds with known activities can be helpful in selecting the relevant descriptors of QSAR models.

[1]  Xiaojun Chang,et al.  Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Mahdi Nooshyar,et al.  Application of artificial neural networks for predicting the aqueous acidity of various phenols using QSAR , 2006, Journal of molecular modeling.

[3]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ignacio Ponzoni,et al.  Multi‐Objective Feature Selection in QSAR Using a Machine Learning Approach , 2009 .

[5]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[6]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[7]  Cristina Ventura,et al.  Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds. , 2013, European journal of medicinal chemistry.

[8]  Mohammad Ali Zare Chahooki,et al.  A Survey on semi-supervised feature selection methods , 2017, Pattern Recognit..

[9]  Feiping Nie,et al.  Efficient semi-supervised feature selection with noise insensitive trace ratio criterion , 2013, Neurocomputing.

[10]  Erdem Buyukbingol,et al.  Adaptive neuro-fuzzy inference system (ANFIS): a new approach to predictive modeling in QSAR applications: a study of neuro-fuzzy modeling of PCP-based NMDA receptor antagonists. , 2007, Bioorganic & medicinal chemistry.

[11]  Liew Chin Yee,et al.  Current Modeling Methods Used in QSAR/QSPR , 2012 .

[12]  Mohammed Ramdani,et al.  A hybrid decision trees-adaptive neuro-fuzzy inference system in prediction of anti-HIV molecules , 2011, Expert Syst. Appl..

[13]  M. Mousavi,et al.  Gravitational search algorithm: A new feature selection method for QSAR study of anticancer potency of imidazo[4,5-b]pyridine derivatives , 2013 .

[14]  Li Zhao,et al.  Manifold based fisher method for semi-supervised feature selection , 2013, 2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[15]  M. Jalali-Heravi,et al.  Quantitative structure-activity relationship study of serotonin (5-HT7) receptor inhibitors using modified ant colony algorithm and adaptive neuro-fuzzy interference system (ANFIS). , 2009, European journal of medicinal chemistry.

[16]  Bahram Hemmateenejad,et al.  Quantitative structure-retention relationship for the Kovats retention indices of a large set of terpenes: a combined data splitting-feature selection strategy. , 2007, Analytica chimica acta.

[17]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Zhiguang Qin,et al.  Graph-Based Semi-supervised Feature Selection with Application to Automatic Spam Image Identification , 2011 .

[19]  Nicu Sebe,et al.  Exploiting the entire feature space with sparsity for automatic image annotation , 2011, ACM Multimedia.

[20]  David Tai,et al.  Evaluation of mutual information, genetic algorithm and SVR for feature selection in QSAR regression. , 2011, Current drug discovery technologies.

[21]  Denis Hamad,et al.  Constraint scores for semi-supervised feature selection: A comparative study , 2011, Pattern Recognit. Lett..

[22]  Khalid Benabdeslem,et al.  Constrained Laplacian Score for Semi-supervised Feature Selection , 2011, ECML/PKDD.

[23]  Richard Jensen,et al.  Feature Selection and Linear/Nonlinear Regression Methods for the Accurate Prediction of Glycogen Synthase Kinase-3β Inhibitory Activities , 2009, J. Chem. Inf. Model..

[24]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[25]  Jian Zhang,et al.  Semi-supervised feature selection based on local discriminative information , 2016, Neurocomputing.

[26]  Zheng Rong Yang,et al.  Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR , 2004, J. Chem. Inf. Model..

[27]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[28]  Saeed Bagheri,et al.  Unsupervised selection of informative descriptors in QSAR study of anti-HIV activities of HEPT derivatives , 2013 .

[29]  Saso Dzeroski,et al.  Semi-Supervised Learning for Quantitative Structure-Activity Modeling , 2013, Informatica.

[30]  Ming Yang,et al.  Semi_Fisher Score: A semi-supervised method for feature selection , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[31]  Karim Faez,et al.  Feature selection method based on fuzzy entropy for regression in QSAR studies , 2009 .

[32]  Qin Tong,et al.  Molecular fingerprint-based artificial neural networks QSAR for ligand biological activity predictions. , 2012, Molecular pharmaceutics.

[33]  Jiwei Hu,et al.  A QSAR Study on Neurotrophic Activities of N-p-Tolyl/phenylsulfonyl L-Amino Acid Thiolester Derivatives , 2011, 2011 International Conference on Business Computing and Global Informatization.

[34]  Yvan Vander Heyden,et al.  Towards better understanding of feature-selection or reduction techniques for Quantitative Structure–Activity Relationship models , 2013 .

[35]  Nicu Sebe,et al.  Discriminating Joint Feature Analysis for Multimedia Data Understanding , 2012, IEEE Transactions on Multimedia.

[36]  Richard Jensen,et al.  Ant colony optimization as a feature selection method in the QSAR modeling of anti-HIV-1 activities of 3-(3,5-dimethylbenzyl)uracil derivatives using MLR, PLS and SVM regressions , 2009 .

[37]  S. Gharaghani,et al.  Constraint score for semi-supervised feature selection in ligand-and receptor-based QSAR on serine/threonine-protein kinase PLK3 inhibitors , 2017 .

[38]  Jidong Zhao,et al.  Locality sensitive semi-supervised feature selection , 2008, Neurocomputing.

[39]  Michel Verleysen,et al.  A graph Laplacian based approach to semi-supervised feature selection for regression problems , 2013, Neurocomputing.

[40]  K. Héberger,et al.  Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters† , 2015, SAR and QSAR in environmental research.

[41]  Michel Verleysen,et al.  Graph Laplacian for Semi-supervised Feature Selection in Regression Problems , 2011, IWANN.

[42]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[43]  Panagiotis Patrinos,et al.  Variable Selection in Nonlinear Modeling Based on RBF Networks and Evolutionary Computation , 2010, Int. J. Neural Syst..