Enzyme classification using reactive motifs

Reactive motifs are short conserved sub–sequences discovered from functional sites of enzyme sequences, and can be used as an effective representation of enzyme sequences. However, the lack of site information leads to low–coverage reactive motifs. With the use of background knowledge, a motif generalisation method is required to increase reactive motifs' coverage. We show that a fuzzy concept lattice (FCL) provides an efficient representation of both single–value and multi–value biological background knowledge and an efficient computational support for generalising reactive motifs. Compared to statistical and expert–based motifs, we show that the generalised reactive motifs using FCL with SVM classifier produce satisfactory accuracy in classifying new enzymes. Further, they improve interpretability of the classification results and provide more biological evidences to biologists. All of the generalised reactive motifs are relevant to the functional sites, and the way they are combined to perform protein function is useful for numerous applications in bioinformatics.

[1]  David Horn,et al.  Biological roles of specific peptides in enzymes , 2008, Proteins.

[2]  W. Taylor,et al.  The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[3]  A. Jaoua,et al.  Discovering knowledge from fuzzy concept lattice , 2001 .

[4]  Samir Elloumi,et al.  The Fuzzy Classifier by Concept Localization in a Lattice of Concepts , 2004, CLA.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[7]  Thanapat Kangkachit,et al.  Concept Lattice-Based Mutation Control for Reactive Motifs Discovery , 2008, PAKDD.

[8]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[9]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[11]  Douglas L. Brutlag,et al.  Sequence Motifs: Highly Predictive Features of Protein Function , 2006, Feature Extraction.

[12]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[13]  Eytan Ruppin,et al.  Functional Representation of Enzymes by Specific Peptides , 2007, PLoS Comput. Biol..

[14]  David Horn,et al.  Data mining of enzymes using specific peptides , 2009, BMC Bioinformatics.

[15]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[16]  Radim Belohlávek,et al.  What is a Fuzzy Concept Lattice? II , 2011, RSFDGrC.

[17]  Vilém Vychodil,et al.  What is a fuzzy concept lattice? , 2005, CLA.

[18]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[19]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[20]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[21]  Douglas L. Brutlag,et al.  The EMOTIF database , 2001, Nucleic Acids Res..

[22]  Hamilton O. Smith,et al.  Finding sequence motifs in groups of functionally related proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Alex Alves Freitas,et al.  On the Importance of Comprehensible Classification Models for Protein Function Prediction , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Lin Lu,et al.  3MATRIX and 3MOTIF: a protein structure visualization system for conserved sequence motifs , 2003, Nucleic Acids Res..

[25]  P. Bork,et al.  Protein sequence motifs. , 1996, Current opinion in structural biology.

[26]  J. Schug,et al.  Predicting gene ontology functions from ProDom and CDD protein domains. , 2002, Genome research.

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[28]  M. Kanehisa,et al.  Cluster analysis of amino acid indices for prediction of protein structure and function. , 1988, Protein engineering.

[29]  Minoru Kanehisa,et al.  New amino acid indices based on residue network topology. , 2007, Genome informatics. International Conference on Genome Informatics.

[30]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[31]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[32]  Thanapat Kangkachit,et al.  Fuzzy Concept Lattice-based Approach for Reactive Motifs Discovery , 2012, BIOINFORMATICS.

[33]  Valerie V. Cross,et al.  Creating Fuzzy Concepts: the One-Sided Threshold, Fuzzy Closure and Factor Analysis Methods , 2011, RSFDGrC.

[34]  Eytan Ruppin,et al.  Motif extraction and protein classification , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[35]  Amos Bairoch,et al.  The PROSITE dictionary of sites and patterns in proteins, its current status , 1993, Nucleic Acids Res..