论文信息 - A Similar Fragments Merging Approach to Learn Automata on Proteins

A Similar Fragments Merging Approach to Learn Automata on Proteins

We propose here to learn automata for the characterization of proteins families to overcome the limitations of the position-specific characterizations classically used in Pattern Discovery. We introduce a new heuristic approach learning non-deterministic automata based on selection and ordering of significantly similar fragments to be merged and on physico-chemical properties identification. Quality of the characterization of the major intrinsic protein (MIP) family is assessed by leave-one-out cross-validation for a large range of models specificity.

François Coste | Goulven Kerbellec

[1] Daniel Fredouille,et al. What is the Search Space for the Inference of Non Deterministic, Unambiguous and Deterministic Automata ? , 2003 .

[2] Barak A. Pearlmutter,et al. Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[3] Kevin J. Lang. Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[4] Sean R. Eddy,et al. HMMER User's Guide - Biological sequence analysis using profile hidden Markov models , 1998 .

[5] C. Patten,et al. Finding Patterns in Biological Sequences , 2000 .

[6] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[7] D. Higgins,et al. Finding flexible patterns in unaligned protein sequences , 1995, Protein science : a publication of the Protein Society.

[8] David Haussler,et al. Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars , 1994, CPM.

[9] Ian H. Witten,et al. Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[10] I. Rigoutsos,et al. The emergence of pattern discovery techniques in computational biology. , 2000, Metabolic engineering.

[11] Andrea Califano,et al. SPLASH: structural pattern localization analysis by sequential histograms , 2000, Bioinform..

[12] Takashi Yokomori,et al. Learning non-deterministic finite automata from queries and counterexamples , 1994, Machine Intelligence 13.

[13] Daniel Fredouille,et al. Apprentissage d'automates par fusions de paires de fragments significativement similaires et premières expérimentations sur les protéines MIP , 2003 .

[14] Burkhard Morgenstern,et al. DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[15] Aris Floratos,et al. Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[16] H. Gueuné,et al. MIPDB: a relational database dedicated to MIP family proteins , 2005, Biology of the cell.

[17] W. Taylor,et al. The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[18] J. Oncina,et al. INFERRING REGULAR LANGUAGES IN POLYNOMIAL UPDATED TIME , 1992 .

[19] David R. Gilbert,et al. Approaches to the Automatic Discovery of Patterns in Biosequences , 1998, J. Comput. Biol..

[20] Amos Bairoch,et al. Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[21] Richard Hughey,et al. Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..