Banishing Bias from Consensus Sequences

With the exploding size of genome databases, it is becoming increasingly important to devise search procedures that extract relevant information from them. One such procedure is particularly effective in finding new, distant members of a given family of related sequences: start with a multiple alignment of the given members of the family and use an integral or fractional consensus sequence derived from the alignment to further probe the database. However, the multiple alignment constructed to begin with may be biased due to skew in the sample of sequences used to construct it.

[1]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[2]  Martin Vingron,et al.  A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..

[3]  M Vingron,et al.  Weighting in sequence space: a comparison of methods in terms of generalized sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[4]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[5]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[6]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[7]  R Dular,et al.  Comparison of Gen-Probe commercial kit and culture technique for the diagnosis of Mycoplasma pneumoniae infection , 1988, Journal of clinical microbiology.

[8]  Anders Krogh,et al.  Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA , 1995, ISMB.

[9]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[10]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[11]  M. O. Dayhoff,et al.  Establishing homologies in protein sequences. , 1983, Methods in enzymology.

[12]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[13]  R. Ravi,et al.  Approximation Algorithms for Multiple Sequence Alignment Under a Fixed Evolutionary Tree , 1995, CPM.

[14]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[15]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[16]  Minoru Ito,et al.  Polynomial-Time Algorithms for Computing Characteristic Strings , 1994, CPM.

[17]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[18]  P. Bucher,et al.  Improving the sensitivity of the sequence profile method , 1994, Protein science : a publication of the Protein Society.

[19]  Prabhakar Raghavan,et al.  Randomized rounding: A technique for provably good algorithms and algorithmic proofs , 1985, Comb..

[20]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[21]  Julie Dawn Thompson,et al.  Improved sensitivity of profile searches through the use of sequence weights and gap excision , 1994, Comput. Appl. Biosci..

[22]  C. Chothia,et al.  Volume changes in protein evolution. , 1994, Journal of molecular biology.

[23]  Prabhakar Raghavan,et al.  Probabilistic construction of deterministic algorithms: Approximating packing integer programs , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[24]  A. Macario,et al.  Gene Probes for Bacteria , 1990 .

[25]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.