Predicting functional residues of protein sequence alignments as a feature selection task

Determining which residues within a multiple alignment of protein sequences are most responsible for protein function is a difficult and important task in bioinformatics. Here, we show that this task is an application of the standard Feature Selection (FS) problem. We show the comparison of standard FS techniques with more specialised algorithms on a range of data sets backed by experimental evidence, and find that some standard algorithms perform as well as specialised ones. We also discuss how considering the discriminating power of combinations of residue positions, rather than the power of each position individually, has the potential to improve the performance of such algorithms.

[1]  Jennifer G. Dy Unsupervised Feature Selection , 2007 .

[2]  Byung-Hoon Park,et al.  In silico discovery of enzyme-substrate specificity-determining residue clusters. , 2005, Journal of molecular biology.

[3]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[4]  Satoshi Mitsuda,et al.  Crystal structure of nitrile hydratase from a thermophilic Bacillus smithii. , 2003, Biochemical and biophysical research communications.

[5]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[6]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[7]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[8]  Isao Endo,et al.  Novel non-heme iron center of nitrile hydratase with a claw setting of oxygen atoms , 1998, Nature Structural Biology.

[9]  Mona Singh,et al.  Characterization and prediction of residues determining protein functional specificity , 2008, Bioinform..

[10]  Christian Ewert,et al.  Enantioselective conversion of α-arylnitriles by Klebsiella oxytoca , 2008 .

[11]  Anna R Panchenko,et al.  Functional specificity lies within the properties and evolutionary changes of amino acids. , 2007, Journal of molecular biology.

[12]  Desmond G. Higgins,et al.  Supervised multivariate analysis of sequence groups to identify specificity determining residues , 2007, BMC Bioinformatics.

[13]  P E Bourne,et al.  The protein kinase resource. , 1997, Trends in biochemical sciences.

[14]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[15]  J. Heringa,et al.  Sequence comparison by sequence harmony identifies subtype-specific functional sites , 2006, Nucleic acids research.

[16]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[18]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[19]  K. Ito,et al.  Crystal structure of cobalt-containing nitrile hydratase. , 2001, Biochemical and biophysical research communications.

[20]  Luquan Wang,et al.  Human members of the eukaryotic protein kinase family , 2002, Genome Biology.

[21]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[22]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[23]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[24]  C. Sander,et al.  Determinants of protein function revealed by combinatorial entropy optimization , 2007, Genome Biology.

[25]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[26]  Susan S. Taylor,et al.  How do protein kinases discriminate between serine/threonine and tyrosine? Structural insights from the insulin receptor protein‐tyrosine kinase , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[27]  Shijun Qian,et al.  High resolution X-ray molecular structure of the nitrile hydratase from Rhodococcus erythropolis AJ270 reveals posttranslational oxidation of two cysteines into sulfinic acids and a novel biocatalytic nitrile hydration mechanism. , 2007, Biochemical and biophysical research communications.

[28]  Ludmila Martinkova and Veronika Mylerova Synthetic Applications of Nitrile-Converting Enzymes , 2003 .

[29]  Mei-Xiang Wang Enantioselective Biotransformations of Nitriles in Organic Synthesis , 2005 .

[30]  Dennis R. Livesay,et al.  Predicting functional sites with an automated algorithm suitable for heterogeneous datasets , 2005, BMC Bioinformatics.

[31]  M. Gelfand,et al.  Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families , 2004, Protein science : a publication of the Protein Society.

[32]  Marie desJardins,et al.  An interactive visualization tool to explore the biophysical properties of amino acids and their contribution to substitution matrices , 2006, BMC Bioinformatics.

[33]  Mathura S Venkatarajan,et al.  New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties , 2001 .

[34]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[35]  A. Banerjee,et al.  RETRACTED ARTICLE: The nitrile-degrading enzymes: current status and future prospects , 2002, Applied Microbiology and Biotechnology.

[36]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[37]  Mizuo Maeda,et al.  Mutational Study on αGln90 of Fe-Type Nitrile Hydratase from Rhodococcus sp. N771 , 2006, Bioscience, biotechnology, and biochemistry.

[38]  S. Hanks,et al.  Protein kinase catalytic domain sequence database: identification of conserved features of primary structure and classification of family members. , 1991, Methods in enzymology.

[39]  Geoffrey J. Barton,et al.  The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction , 2015 .

[40]  T. Hunter,et al.  The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification 1 , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.