A novel synonymous processing method based on amino acid substitution matrics for the classification of G-protein-coupled receptors

Extracting valuable features and filtering out redundancy are the key challenges to determine the overall classification performance for G-protein-coupled receptors (GPCRs). In this study, we consider improving the feature synonym problem, and put forward a novel feature knowledge mining strategy based on functional word clustering and integration. The essence behind the method is the novel feature knowledge mining strategy. Through evaluating the independence of each candidate feature using the evolutionary hypothesis based on residue substitution matrices, clustering candidate features, and fusing them by retaining the main functional words, the proposed strategy adds a layer between the feature extraction layer and the prediction layer. Based on the proposed method, four classic machine learning algorithms in conjunction with the feature extraction method were applied to classify GPCRs at all family levels. Surprisingly, these classifiers achieve considerable performance in almost all evaluation criteria which indicated the validity and superiority of the proposed molecular evolution based feature extraction method.

[1]  D. Baker,et al.  G protein-coupled receptors: the evolution of structural insight. , 2017, AIMS biophysics.

[2]  Peng Wang,et al.  Machine learning in bioinformatics: A brief survey and recommendations for practitioners , 2006, Comput. Biol. Medicine.

[3]  Judith Klein-Seetharaman,et al.  PROTEINS: Structure, Function, and Bioinformatics 58:955–970 (2005) Protein Classification Based on Text Document Classification Techniques , 2022 .

[4]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[5]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[6]  Anton Simeonov,et al.  Unexplored therapeutic opportunities in the human genome , 2018, Nature Reviews Drug Discovery.

[7]  Gerhard Hessler,et al.  Drug Design Strategies for Targeting G‐Protein‐Coupled Receptors , 2002, Chembiochem : a European journal of chemical biology.

[8]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[9]  Shay Bar-Haim,et al.  G protein-coupled receptors: in silico drug discovery in 3D. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Alexander S. Hauser,et al.  GPCRdb in 2018: adding GPCR structure models and ligands , 2017, Nucleic Acids Res..

[11]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[12]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[13]  Suhaila Zainudin,et al.  Filter-wrapper approach to feature selection of GPCR protein , 2015, 2015 International Conference on Electrical Engineering and Informatics (ICEEI).

[14]  Engelbert Mephu Nguifo,et al.  Protein sequences classification by means of feature extraction with substitution matrices , 2010, BMC Bioinformatics.

[15]  Alex Alves Freitas,et al.  On the hierarchical classification of G protein-coupled receptors , 2007, Bioinform..

[16]  W. Miller,et al.  Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. , 2000, Science.

[17]  Xuan Zhou,et al.  Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm , 2010, BMC Bioinformatics.

[18]  A. Leslie,et al.  Agonist-bound adenosine A2A receptor structures reveal common features of GPCR activation , 2011, Nature.