The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions

MOTIVATION Amino acid substitution matrices play a central role in protein alignment methods. Standard log-odds matrices, such as those of the PAM and BLOSUM series, are constructed from large sets of protein alignments having implicit background amino acid frequencies. However, these matrices frequently are used to compare proteins with markedly different amino acid compositions, such as transmembrane proteins or proteins from organisms with strongly biased nucleotide compositions. It has been argued elsewhere that standard matrices are not ideal for such comparisons and, furthermore, a rationale has been presented for transforming a standard matrix for use in a non-standard compositional context. RESULTS This paper presents the mathematical details underlying the compositional adjustment of amino acid or DNA substitution matrices.

[1]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[3]  W G Hol,et al.  Crystal structure of fructose-1,6-bisphosphate aldolase from the human malaria parasite Plasmodium falciparum. , 1998, Biochemistry.

[4]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[5]  Nikos Kyrpides,et al.  Genome Sequence and Analysis of the Oral Bacterium Fusobacterium nucleatum Strain ATCC 25586 , 2002, Journal of bacteriology.

[6]  S. Altschul,et al.  Improved Sensitivity of Nucleic Acid Database Searches Using Application-Specific Scoring Matrices , 1991 .

[7]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[8]  Wim G. J. Hol,et al.  FRUCTOSE-1,6-BISPHOSPHATE ALDOLASE FROM PLASMODIUM FALCIPARUM , 1998 .

[9]  S. Altschul,et al.  The compositional adjustment of amino acid substitution matrices , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  S. Rump Polynomial minimum root separation , 1979 .

[11]  Jorja G. Henikoff,et al.  PHAT: a transmembrane-specific substitution matrix , 2000, Bioinform..

[12]  N. Sueoka Directional mutation pressure and neutral molecular evolution. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[13]  John C. Wootton,et al.  A Global Compositional Complexity Measure for Biological Sequences: AT-rich and GC-rich Genomes Encode Less Complex Proteins , 2000, Comput. Chem..

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[16]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[17]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[19]  A. Dembo,et al.  Limit Distribution of Maximal Non-Aligned Two-Sequence Segmental Score , 1994 .

[20]  Sven Rahmann,et al.  Non-symmetric score matrices and the detection of homologous transmembrane proteins , 2001, ISMB.

[21]  S. Altschul,et al.  The estimation of statistical parameters for local alignment score distributions. , 2001, Nucleic acids research.

[22]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[23]  Stephen J Freeland,et al.  A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes , 2001, Genome Biology.

[24]  S. Altschul A protein alignment scoring system sensitive at all evolutionary distances , 1993, Journal of Molecular Evolution.