Protein Motif Prediction by Grammatical Inference

The rapid growth of protein sequence databases is exceeding the capacity of biochemically and structurally characterizing new proteins. Therefore, it is very important the development of tools to locate, within protein sequences, those subsequences with an associated function or specific feature. In our work, we propose a method to predict one of those functional motifs (coiled coil), related with protein interaction. Our approach uses even linear languages inference to obtain a transductor which will be used to label unknown sequences. The experiments carried out show that our method outperforms the results of previous approaches.

[1]  Esko Ukkonen,et al.  Pattern Discovery in Biosequences , 1998, ICGI.

[2]  Satoshi Kobayashi,et al.  Learning local languages and its application to protein /spl alpha/-chain identification , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[3]  M. Delorenzi,et al.  An HMM model for coiled-coil domains and a comparison with PSSM-based predictions , 2002, Bioinform..

[4]  José M. Sempere,et al.  Learning Locally Testable Even Linear Languages from Positive Data , 2002, ICGI.

[5]  B. Berger,et al.  MultiCoil: A program for predicting two‐and three‐stranded coiled coils , 1997, Protein science : a publication of the Protein Society.

[6]  Horst Bunke,et al.  Advances In Structural And Syntactic Pattern Recognition , 1993 .

[7]  Timo Knuutila Inference of k -testable Tree Languages , 1993 .

[8]  Brian H. Mayoh Advances in Structural and Syntactic Pattern Recognition , 1995 .

[9]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[10]  Ayumi Shinohara,et al.  A learning algorithm for elementary formal systems and its experiments on identification of transmembrane domains , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[11]  José Francisco Martínez-Trinidad,et al.  Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, November 13-16, 2007, Proceedings , 2008, CIARP.

[12]  M Singh,et al.  LearnCoil-VMF: computational evidence for coiled-coil-like motifs in many viral membrane-fusion proteins , 1999, Journal of Molecular Biology.

[13]  Hiroki Arimura,et al.  A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases , 1998, ALT.

[14]  Satoshi Kobayashi,et al.  Learning Local Languages and Their Application to DNA Sequence Analysis , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  J. Skehel,et al.  Coiled Coils in Both Intracellular Vesicle and Viral Membrane Fusion , 1998, Cell.

[16]  Damián López,et al.  Neural Network Approach to Locate Motifs in Biosequences , 2005, CIARP.

[17]  Rafael C. Carrasco,et al.  Grammatical Inference and Applications , 1994, Lecture Notes in Computer Science.

[18]  José M. Sempere,et al.  A Characterization of Even Linear Languages and its Application to the Learning Problem , 1994, ICGI.

[19]  D. Searls,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[20]  B. Berger,et al.  Predicting coiled coils by use of pairwise residue correlations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[21]  P. S. Kim,et al.  HIV Entry and Its Inhibition , 1998, Cell.

[22]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[23]  Yasubumi Sakakibara,et al.  Grammatical inference in bioinformatics , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[25]  Hong Yan,et al.  Pattern recognition techniques for the emerging field of bioinformatics: A review , 2005, Pattern Recognit..

[26]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[27]  M Singh,et al.  Computational learning reveals coiled coil-like motifs in histidine kinase linker domains , 1998, Proc. Natl. Acad. Sci. USA.

[28]  Damián López,et al.  Detection of Functional Motifs in Biosequences: A Grammatical Inference Approach , 2004, Spanish Bioinformatics Conference.

[29]  Edward R. Dougherty,et al.  The fundamental role of pattern recognition for gene-expression/microarray data in bioinformatics , 2005, Pattern Recognit..

[30]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[31]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[32]  Pedro García Learning k-Testable tree sets from positive data* , 2003 .