Selecting the Right Protein‐Scoring Matrix

Every program for searching protein sequences against a database includes a choice of a protein weight matrix, also called a scoring matrix. Weight matrices add sensitivity to the search, while statistical significance adds selectivity. Virtually every user chooses the default, typically PAM 250 or BLOSUM62. Despite the fact that the choice of matrix can strongly influence the outcome of the analysis, most users do not know why a particular matrix should be used. In general, scoring matrices implicitly represent a particular theory of protein sequence evolution. Understanding the assumptions underlying the PAM and BLOSUM scoring matrices can aid in making the proper choice. The purpose of this unit is to guide the choice of a scoring matrix. It covers the selection of PAM matrices, BLOSUM matrices and provides a brief overview of the wide variety of specialized scoring matrices.

[1]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[2]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[3]  P. Argos,et al.  Suggestions for "safe" residue substitutions in site-directed mutagenesis. , 1991, Journal of molecular biology.

[4]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[5]  J. Claverie,et al.  Detecting frame shifts by amino acid sequence comparison. , 1993, Journal of molecular biology.

[6]  D. G. George,et al.  Mutation data matrix and its uses. , 1990, Methods in enzymology.

[7]  P. A. P. Moran,et al.  An introduction to probability theory , 1968 .

[8]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[9]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[10]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[11]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.