Comparison and fusion model in protein motifs

Motifs are useful in biology to highlight the nucleotides/amino-acids that are involved in structure, function, regulation and evolution, or to infer homology between genes/proteins. PROSITE is a strategy to model protein motifs as Regular Expressions and Position Frequency Matrices. Multiple tools have been proposed to discover biological motifs, but not for the case of the motifs comparison problem, which is NP-Complete due to flexibility and independence at each position. In this paper we present a formal model to compare two protein motifs based on the Genetic Programming to generate the population of sequences derived from every regular expression under comparison and on a Neural Network Backpropagation to calculate a motif similarity score as fitness function. Additionally, we present a fusion formal method for two similar motifs based on the Ant Colony Optimization technique. The comparison and fusion method was tested using amyloid protein motifs.

[1]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Cathy H. Wu,et al.  Motif neural network design for large-scale protein family identification , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[3]  Li-Yeh Chuang,et al.  DNA Motif Discovery Based on Ant Colony Optimization and Expectation Maximization , 2010 .

[4]  S. Roberts Using an Amyloid Precursor Protein (APP) Reporter to Characterize α-Secretase. , 2000, Methods in molecular medicine.

[5]  Paulo J. Azevedo,et al.  Evaluating deterministic motif significance measures in protein databases , 2007, Algorithms for Molecular Biology.

[6]  Nir Friedman,et al.  A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval , 2008, PLoS Comput. Biol..

[7]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[8]  Abdellah Boukerram,et al.  Motif Finding Using Ant Colony Optimization , 2010, ANTS Conference.

[9]  B Persson,et al.  Bioinformatics in protein analysis. , 2000, EXS.

[10]  Amos Bairoch,et al.  Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times! , 2000, Bioinform..

[11]  Derong Liu,et al.  A self-organizing neural network structure for motif identification in DNA sequences , 2005, Proceedings. 2005 IEEE Networking, Sensing and Control, 2005..

[12]  Dimitrios I. Fotiadis,et al.  Motif-Based Protein Sequence Classification Using Neural Networks , 2005, J. Comput. Biol..

[13]  Antony Le Béchec,et al.  AMYPdb: A database dedicated to amyloid precursor proteins , 2008, BMC Bioinformatics.

[14]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Sipe,et al.  Review: history of the amyloid fibril. , 2000, Journal of structural biology.

[16]  G. K. Sandve,et al.  A survey of motif discovery methods in an integrated framework , 2006, Biology Direct.

[17]  P. Westermark Classification of amyloid fibril proteins and their precursors: An ongoing discussion , 1997 .

[18]  Robert B. Russell,et al.  DILIMOT: discovery of linear motifs in proteins , 2006, Nucleic Acids Res..

[19]  Xin Yao,et al.  Automatic Discovery of Protein Motifs Using Genetic Programming , 2004 .

[20]  Tony Håndstad,et al.  Motif kernel generated by genetic programming improves remote homology and fold detection , 2007, BMC Bioinformatics.

[21]  Jitender S. Deogun,et al.  A New Scheme for Protein Sequence Motif Extraction , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[22]  A. Tramontano,et al.  Exploiting Publicly Available Biological and Biochemical Information for the Discovery of Novel Short Linear Motifs , 2011, PloS one.

[23]  Amund Tveit,et al.  Discovering biological motifs with genetic programming , 2005, GECCO '05.

[24]  Richard J. Edwards,et al.  CompariMotif: quick and easy comparisons of sequence motifs , 2008, Bioinform..

[25]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[26]  David A. Fenstermacher,et al.  Introduction to bioinformatics , 2005, J. Assoc. Inf. Sci. Technol..

[27]  Tim J. P. Hubbard,et al.  NestedMICA as an ab initio protein motif discovery tool , 2008, BMC Bioinformatics.

[28]  Sylvia Yang,et al.  The Effect of Consumers and Mutualists of Vaccinium membranaceum at Mount St. Helens: Dependence on Successional Context , 2011, PloS one.

[29]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[30]  John R. Koza Genetic Programming III - Darwinian Invention and Problem Solving , 1999, Evolutionary Computation.

[31]  Richard J. Edwards,et al.  SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins , 2007, PloS one.

[32]  Jonathan Pevsner,et al.  Bioinformatics and functional genomics , 2003 .

[33]  José Ramón Hilera González,et al.  Redes neuronales artificiales: fundamentos, modelos y aplicaciones , 1995 .

[34]  Joseph Seckbach,et al.  The New Avenues in Bioinformatics , 2004, Cellular Origin and Life in Extreme Habitats and Astrobiology.

[35]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[36]  Olivier Elemento,et al.  Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes , 2010, PloS one.

[37]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[38]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[39]  N. Hooper Alzheimer's disease : methods and protocols , 2000 .

[40]  Mark J. Warshawsky,et al.  A Modern Approach , 2005 .

[41]  Richard J. Edwards,et al.  SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs , 2010, Nucleic Acids Res..