Filter-wrapper approach to feature selection of GPCR protein

Protein dataset contains high dimensional feature space. These features may encompass of noise and not relatively to protein function. Therefore, we need to select the appropriate features to improve the efficiency and performance of the classifier. Feature selection is an important step in any classification tasks. Filter methods are important in order to obtain only the relevant features to the class and to avoid redundancy. While wrapper methods are applied to get optimized features and better classification accuracy. This paper proposed a feature selection strategy for hierarchical classification of G-Protein-Coupled Receptors (GPCR) based on hybridization of correlation feature selection (CFS) filter and genetic algorithm (GA) wrapper methods. The optimum features were then classified using K-nearest neighbor algorithm. These methods are capable to reduce the features and achieved comparable classification accuracy at every hierarchy level. The results also shown that the integration between CFS and GA is capable of searching the optimum features for hierarchical protein classification.

[1]  Xuan Zhou,et al.  Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm , 2010, BMC Bioinformatics.

[2]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[3]  Hugh E. Williams,et al.  Simple and accurate feature selection for hierarchical categorisation , 2002, DocEng '02.

[4]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[5]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..

[6]  Alex Alves Freitas,et al.  Hierarchical classification of G-Protein-Coupled Receptors with data-driven selection of attributes and classifiers , 2009 .

[7]  Zhong Ming,et al.  Text Learning and Hierarchical Feature Selection in Webpage Classification , 2008, ADMA.

[8]  Alex Alves Freitas,et al.  Selecting different protein representations and classification algorithms in hierarchical protein function prediction , 2011, Intell. Data Anal..

[9]  Yong Deng,et al.  A novel feature selection method based on CFS in cancer recognition , 2012, 2012 IEEE 6th International Conference on Systems Biology (ISB).

[10]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[11]  Alex Alves Freitas,et al.  Exploring Attribute Selection in Hierarchical Classification , 2014, J. Inf. Data Manag..

[12]  Kolakowski Lf GCRDB: A G-PROTEIN-COUPLED RECEPTOR DATABASE , 1994 .

[13]  H. E. Chandler,et al.  Technical writer's handbook , 1982, IEEE Transactions on Professional Communication.

[14]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[15]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[16]  Yanqing Zhang,et al.  A genetic algorithm-based method for feature subset selection , 2008, Soft Comput..

[17]  Nasser Ghasem-Aghaee,et al.  A novel ACO-GA hybrid algorithm for feature selection in protein function prediction , 2009, Expert Syst. Appl..

[18]  T. Yorozu,et al.  Electron Spectroscopy Studies on Magneto-Optical Media and Plastic Substrate Interface , 1987, IEEE Translation Journal on Magnetics in Japan.

[19]  Aaron Kershenbaum,et al.  The Effect of Using Hierarchical Classifiers in Text Categorization , 2000, RIAO.

[20]  L. F. Kolakowski GCRDb: a G-protein-coupled receptor database. , 1994, Receptors & channels.

[21]  Colin G. Johnson,et al.  Particle swarm for attribute selection in Bayesian classification: an application to protein function prediction , 2008 .

[22]  Henri Xhaard,et al.  Predicting G-protein-coupled receptors families using different physiochemical properties and pseudo amino acid composition. , 2013, Methods in enzymology.

[23]  hierarchyDunja Mladeni Feature Selection for Classiication Based on Text Hierarchy , 1998 .

[24]  B. Moshiri,et al.  Prediction of protein submitochondria locations based on data fusion of various features of sequences. , 2011, Journal of theoretical biology.

[25]  Li-Yeh Chuang,et al.  IG-GA: A Hybrid Filter/Wrapper Method for Feature Selection of Microarray Data , 2010 .

[26]  Ali A. Ghorbani,et al.  An Iterative Hybrid Filter-Wrapper Approach to Feature Selection for Document Clustering , 2009, Canadian Conference on AI.

[27]  Jinsong Leng,et al.  A genetic Algorithm-Based feature selection , 2014 .

[28]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[29]  Maqsood Hayat,et al.  Membrane protein prediction using wavelet decomposition and pseudo amino acid based feature extraction , 2010, 2010 6th International Conference on Emerging Technologies (ICET).

[30]  Alex Alves Freitas,et al.  On the hierarchical classification of G protein-coupled receptors , 2007, Bioinform..

[31]  Hui-Huang Hsu,et al.  A Hybrid Feature Selection Mechanism , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[32]  Keun Ho Ryu,et al.  Identification of protein functions using a machine-learning approach based on sequence-derived properties , 2009, Proteome Science.