An Adaptive Strategy for the Classification of G-Protein Coupled Receptors

One of the major problems in computational biology is the inability of existing classification models to incorporate expanding and new domain knowledge. This problem of static classification models is addressed in this paper by the introduction of incremental learning for problems in bioinformatics. Many machine learning tools have been applied to this problem using static machine learning structures such as neural networks or support vector machines that are unable to accommodate new information into their existing models. We utilize the fuzzy ARTMAP as an alternate machine learning system that has the ability of incrementally learning new data as it becomes available. The fuzzy ARTMAP is found to be comparable to many of the widespread machine learning systems. The use of an evolutionary strategy in the selection and combination of individual classifiers into an ensemble system, coupled with the incremental learning ability of the fuzzy ARTMAP is proven to be suitable as a pattern classifier. The algorithm presented is tested using data from the G-Coupled Protein Receptors Database and shows good accuracy of 83%. The system presented is also generally applicable, and can be used in problems in genomics and proteomics.

[1]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[2]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[5]  D. E. Goldberg,et al.  Genetic Algorithms in Search, Optimization & Machine Learning , 1989 .

[6]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[7]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[8]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 2000, Springer Berlin Heidelberg.

[9]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[10]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[11]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[13]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  A. Baxevanis,et al.  A Practical Guide to the Analysis of Genes and Proteins , 1998 .

[15]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[16]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[17]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[18]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[19]  Dennis Shasha,et al.  New techniques for extracting features from protein sequences , 2001, IBM Syst. J..

[20]  Georgios C. Anagnostopoulos,et al.  Cross-validation in Fuzzy ARTMAP for large databases , 2001, Neural Networks.

[21]  Jon Atli Benediktsson,et al.  The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion , 2001, IEEE Trans. Geosci. Remote. Sens..

[22]  D. Krane,et al.  Fundamental Concepts of Bioinformatics , 2002 .

[23]  D. K. Subramanian,et al.  An efficient incremental protein sequence clustering algorithm , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[24]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[25]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[26]  Golan Yona,et al.  A multi-expert system for the automatic detection of protein domains from sequence information , 2003, RECOMB '03.

[27]  Chuen-Der Huang,et al.  Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification , 2003, IEEE Transactions on NanoBioscience.

[28]  Heitor Silvério Lopes,et al.  Neural networks for protein classification , 2004, Applied bioinformatics.

[29]  J. Lahiri,et al.  G protein-coupled receptor microarrays for drug discovery. , 2003, Drug discovery today.

[30]  Yanqing Zhang,et al.  Granular support vector machines with association rules mining for protein homology prediction , 2005, Artif. Intell. Medicine.

[31]  K. Lundstrom Structural genomics of GPCRs. , 2005, Trends in biotechnology.

[32]  Xing-Ming Zhao,et al.  A novel approach to extracting features from motif content and protein composition for protein sequence classification , 2005, Neural Networks.

[33]  Chang Wook Ahn,et al.  On the practical genetic algorithms , 2005, GECCO '05.

[34]  Judith Klein-Seetharaman,et al.  PROTEINS: Structure, Function, and Bioinformatics 58:955–970 (2005) Protein Classification Based on Text Document Classification Techniques , 2022 .

[35]  Grigorios Tsoumakas,et al.  Protein Classification with Multiple Algorithms , 2005, Panhellenic Conference on Informatics.

[36]  Xing-Ming Zhao,et al.  Classifying protein sequences using hydropathy blocks , 2006, Pattern Recognit..

[37]  Vlado Keselj,et al.  n-Gram-based classification and unsupervised hierarchical clustering of genome sequences , 2006, Comput. Methods Programs Biomed..

[38]  David T. Jones,et al.  Bioinformatics: Genes, Proteins and Computers , 2007 .

[39]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .