Finding Patterns in Protein Sequences by Using a Hybrid Multiobjective Teaching Learning Based Optimization Algorithm

Proteins are molecules that form the mass of living beings. These proteins exist in dissociated forms like amino-acids and carry out various biological functions, in fact, almost all body reactions occur with the participation of proteins. This is one of the reasons why the analysis of proteins has become a major issue in biology. In a more concrete way, the identification of conserved patterns in a set of related protein sequences can provide relevant biological information about these protein functions. In this paper, we present a novel algorithm based on teaching learning based optimization (TLBO) combined with a local search function specialized to predict common patterns in sets of protein sequences. This population-based evolutionary algorithm defines a group of individuals (solutions) that enhance their knowledge (quality) by means of different learning stages. Thus, if we correctly adapt it to the biological context of the mentioned problem, we can get an acceptable set of quality solutions. To evaluate the performance of the proposed technique, we have used six instances composed of different related protein sequences obtained from the PROSITE database. As we will see, the designed approach makes good predictions and improves the quality of the solutions found by other well-known biological tools.

[1]  Mikhail S. Gelfand,et al.  A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length , 2005, Bioinform..

[2]  Khaled Rasheed,et al.  MDGA: motif discovery using a genetic algorithm , 2005, GECCO '05.

[3]  Miguel A. Vega-Rodríguez,et al.  Predicting DNA Motifs by Using Evolutionary Multiobjective Optimization , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[5]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[6]  Ka-Chun Wong,et al.  SECOM: A Novel Hash Seed and Community Detection Based-Approach for Genome-Scale Protein Domain Identification , 2012, PloS one.

[7]  Rong-Ming Chen,et al.  FMGA: finding motifs by genetic algorithm , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[8]  J. Collado-Vides,et al.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. , 2000, Nucleic acids research.

[9]  Mireille Régnier,et al.  Rare Events and Conditional Events on Random Strings , 2004, Discret. Math. Theor. Comput. Sci..

[10]  Sarah A. Teichmann,et al.  DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- and multi-domain proteins , 1998, Bioinform..

[11]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[12]  P. Bork,et al.  Protein sequence motifs. , 1996, Current opinion in structural biology.

[13]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[14]  Miguel A. Vega-Rodríguez,et al.  Convergence analysis of some multiobjective evolutionary algorithms when discovering motifs , 2014, Soft Comput..

[15]  Kwong-Sak Leung,et al.  TFBS identification based on genetic algorithm with combined representations and adaptive post-processing , 2008, Bioinform..

[16]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[17]  Taher Niknam,et al.  A new modified teaching-learning algorithm for reserve constrained dynamic economic dispatch , 2013, IEEE Transactions on Power Systems.

[18]  Dipankar Dasgupta,et al.  Motif discovery in upstream sequences of coordinately expressed genes , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[19]  Zhi Wei,et al.  GAME: detecting cis-regulatory elements using a genetic algorithm , 2006, Bioinform..

[20]  Provas Kumar Roy,et al.  Teaching learning based optimization for short-term hydrothermal scheduling problem considering valve point effect and prohibited discharge constraint , 2013 .

[21]  Saurabh Sinha,et al.  YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation , 2003, Nucleic Acids Res..

[22]  Andrew M. Tyrrell,et al.  Regulatory Motif Discovery Using a Population Clustering Evolutionary Algorithm , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[24]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[25]  Lee Aaron Newberg,et al.  The Gibbs Centroid Sampler , 2007, Nucleic Acids Res..

[26]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[27]  Yuehui Chen,et al.  Bacterial Foraging Optimization Algorithm Integrating Tabu Search for Motif Discovery , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[28]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[29]  Gang Li,et al.  Discovering multiple realistic TFBS motifs based on a generalized model , 2009, BMC Bioinformatics.

[30]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[31]  Liisa Holm,et al.  ADDA: a domain database with global coverage of the protein universe , 2004, Nucleic Acids Res..

[32]  G. Stormo,et al.  ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[33]  G. Fogel,et al.  Discovery of sequence motifs related to coexpression of genes using evolutionary computation. , 2004, Nucleic acids research.

[34]  R. Venkata Rao,et al.  Teaching-learning-based optimization: A novel method for constrained mechanical design optimization problems , 2011, Comput. Aided Des..

[35]  El-Ghazali Talbi,et al.  Metaheuristics - From Design to Implementation , 2009 .

[36]  Graziano Pesole,et al.  MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes , 2006, Nucleic Acids Res..

[37]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[38]  Gary B. Fogel,et al.  Evolutionary computation for discovery of composite transcription factor binding sites , 2008, Nucleic acids research.

[39]  Yuehui Chen,et al.  Motif Discovery Using Evolutionary Algorithms , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[40]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.