论文信息 - Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA

Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA

In a family of proteins or other biological sequences like DNA the various subfamilies are often very unevenly represented. For this reason a scheme for assigning weights to each sequence can greatly improve performance at tasks such as database searching with profiles or other consensus models based on multiple alignments. A new weighting scheme for this type of database search is proposed. In a statistical description of the searching problem it is derived from the maximum entropy principle. It can be proved that, in a certain sense, it corrects for uneven representation. It is shown that finding the maximum entropy weights is an easy optimization problem for which standard techniques are applicable.

Anders Krogh | Graeme J. Mitchison | G. Mitchison | A. Krogh

[1] D. Haussler,et al. Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[2] Sean R. Eddy,et al. Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[3] Amos Bairoch,et al. The PROSITE dictionary of sites and patterns in proteins, its current status , 1993, Nucleic Acids Res..

[4] M Vingron,et al. Weighting in sequence space: a comparison of methods in terms of generalized sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[5] P. Argos,et al. Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[6] S. Henikoff,et al. Position-based sequence weights. , 1994, Journal of molecular biology.

[7] J. Thompson,et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[8] S F Altschul,et al. Weights for data related by a tree. , 1989, Journal of molecular biology.

[9] C. Chothia,et al. Volume changes in protein evolution. , 1994, Journal of molecular biology.

[10] S. Henikoff,et al. Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[11] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12] A. D. McLachlan,et al. Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[13] G. Barton. Protein multiple sequence alignment and flexible pattern matching. , 1990, Methods in enzymology.

[14] P. Argos,et al. Quantification of secondary structure prediction improvement using multiple alignments. , 1993, Protein engineering.

[15] J. Felsenstein. Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[16] William H. Press,et al. Numerical recipes , 1990 .

[17] Julie Dawn Thompson,et al. Improved sensitivity of profile searches through the use of sequence weights and gap excision , 1994, Comput. Appl. Biosci..

[18] W. Taylor,et al. Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[19] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[20] Martin Vingron,et al. A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..