Incremental Learning for Classification of Protein Sequences

The problem of protein structural family classification remains a core problem in computational biology, with application of this technology applicable to problems in drug discovery programs and hypothetical protein annotation. Many machine learning tools have been applied to this problem using static machine learning structures such as neural networks or support vector machines that are unable to accommodate new information into their existing models. We utilize the fuzzy ARTMAP as an alternate machine learning system that has the ability of incrementally learning new data as it becomes available. The fuzzy ARTMAP is found to be comparable to many of the widespread machine learning systems. The use of an evolutionary strategy in the selection and combination of individual classifiers into an ensemble system, coupled with the incremental learning ability of the fuzzy ARTMAP is proven to be suitable as a pattern classifier. The algorithm presented is tested using data from the G-Coupled Protein Receptors Database and shows good accuracy of 83%.

[1]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[2]  K. Lundstrom Structural genomics of GPCRs. , 2005, Trends in biotechnology.

[3]  Jon Atli Benediktsson,et al.  The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion , 2001, IEEE Trans. Geosci. Remote. Sens..

[4]  Georgios C. Anagnostopoulos,et al.  Cross-validation in Fuzzy ARTMAP for large databases , 2001, Neural Networks.

[5]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[6]  Xing-Ming Zhao,et al.  A novel approach to extracting features from motif content and protein composition for protein sequence classification , 2005, Neural Networks.

[7]  Judith Klein-Seetharaman,et al.  PROTEINS: Structure, Function, and Bioinformatics 58:955–970 (2005) Protein Classification Based on Text Document Classification Techniques , 2022 .

[8]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[9]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[11]  Razvan Andonie,et al.  Fuzzy ARTMAP with input relevances , 2006, IEEE Transactions on Neural Networks.

[12]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[13]  Chuen-Der Huang,et al.  Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification , 2003, IEEE Transactions on NanoBioscience.

[14]  D. K. Subramanian,et al.  An efficient incremental protein sequence clustering algorithm , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[15]  Vlado Keselj,et al.  n-Gram-based classification and unsupervised hierarchical clustering of genome sequences , 2006, Comput. Methods Programs Biomed..

[16]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 2000, Springer Berlin Heidelberg.

[17]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Dennis Shasha,et al.  New techniques for extracting features from protein sequences , 2001, IBM Syst. J..

[19]  Xing-Ming Zhao,et al.  Classifying protein sequences using hydropathy blocks , 2006, Pattern Recognit..