GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION

This paper considers the generation of some interpretable fuzzy rules for assigning an amino acid sequence into the appropriate protein superfamily. Since the main objective of this classifier is the interpretability of rules, we have used the distribution of amino acids in the sequences of proteins as features. These features are the occurrence probabilities of six exchange groups in the sequences. To generate the fuzzy rules, we have used some modified versions of a common approach. The generated rules are simple and understandable, especially for biologists. To evaluate our fuzzy classifiers, we have used four protein superfamilies from UniProt database. Experimental results show the comprehensibility of generated fuzzy rules with comparable classification accuracy. Bioinformatics(4)is basically conceptualizing biology in terms of macromolecules and applying informatics techniques to understand and organize the information associated with these molecules. It deals primarily with the application of computer and statistical techniques to the management of biological information. Because of the Human Genome Project and other similar efforts, a large number of biological data are regularly collected. It is important to organize and annotate this massive amount of sequential data to maximize its utility. In this regard, DNA sequences are translated into protein sequences using standard bioinformatics tools. Among these is protein sequence classification, which determines the type or group of proteins to which an unknown protein sequence belongs. One of the benefits from this type of category grouping is that molecular analysis can be carried out within a particular superfamily instead of an individual protein sequence. A protein superfamily consists of protein sequence members that are evolutionally related and therefore functionally and structurally relevant to each other. Several approaches dealing with the protein classification problem have been proposed in the past. These include alignment of protein sequences (2), hidden Markov modeling (14), application of artificial neural networks (23, 24, 25), using

[1]  Cathy H. Wu,et al.  Neural networks and genome informatics , 2000 .

[2]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[3]  M. Madera,et al.  A comparison of profile hidden Markov model procedures for remote homology detection. , 2002, Nucleic acids research.

[4]  Sanghamitra Bandyopadhyay,et al.  An efficient technique for superfamily classification of amino acid sequences: feature extraction, fuzzy clustering and prototype selection , 2005, Fuzzy Sets Syst..

[5]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[6]  Alioune Ngom,et al.  Fast Protein Superfamily Classification Using Principal Component Null Space Analysis , 2005, Canadian Conference on AI.

[7]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[8]  Eghbal G. Mansoori,et al.  A weighting function for improving fuzzy classification systems performance , 2007, Fuzzy Sets Syst..

[9]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[10]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[11]  Ralf Mikut,et al.  Interpretability issues in data-based learning of fuzzy systems , 2005, Fuzzy Sets Syst..

[12]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[13]  Hisao Ishibuchi,et al.  Voting in fuzzy rule-based systems for pattern classification problems , 1999, Fuzzy Sets Syst..

[14]  Antonio González Muñoz,et al.  SLAVE: a genetic learning system based on an iterative approach , 1999, IEEE Trans. Fuzzy Syst..

[15]  Dianhui Wang,et al.  Extraction and Optimization of Fuzzy Protein Sequences Classification Rules Using GRBF Neural Networks , 2003 .

[16]  Hisao Ishibuchi,et al.  Comparison of Heuristic Criteria for Fuzzy Rule Selection in Classification Problems , 2004, Fuzzy Optim. Decis. Mak..

[17]  János Abonyi,et al.  Learning fuzzy classification rules from labeled data , 2003, Inf. Sci..

[18]  Dianhui Wang,et al.  Protein sequence classification using extreme learning machine , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[19]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[20]  Dennis Shasha,et al.  New techniques for extracting features from protein sequences , 2001, IBM Syst. J..

[21]  H. Ishibuchi,et al.  Distributed representation of fuzzy rules and its application to pattern classification , 1992 .

[22]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[23]  W. Pedrycz Why triangular membership functions , 1994 .

[24]  Eghbal G. Mansoori,et al.  Weighting fuzzy classification rules using receiver operating characteristics (ROC) analysis , 2007, Inf. Sci..

[25]  Hisao Ishibuchi,et al.  Rule weight specification in fuzzy rule-based classification systems , 2005, IEEE Transactions on Fuzzy Systems.

[26]  Eghbal G. Mansoori,et al.  USING DISTRIBUTION OF DATA TO ENHANCE PERFORMANCE OF FUZZY CLASSIFICATION SYSTEMS , 2007 .