Protein alignment: Exact versus approximate. An illustration

We illustrate solving the protein alignment problem exactly using the algorithm VESPA (very efficient search for protein alignment). We have compared our result with the approximate solution obtained with BLAST (basic local alignment search tool) software, which is currently the most widely used for searching for protein alignment. We have selected human and mouse proteins having around 170 amino acids for comparison. The exact solution has found 78 pairs of amino acids, to which one should add 17 individual amino acid alignments giving a total of 95 aligned amino acids. BLAST has identified 64 aligned amino acids which involve pairs of more than two adjacent amino acids. However, the difference between the two outputs is not as large as it may appear, because a number of amino acids that are adjacent have been reported by BLAST as single amino acids. So if one counts all amino acids, whether isolated (single) or in a group of two and more amino acids, then the count for BLAST is 89 and for VESPA is 95, a difference of only six. © 2015 Wiley Periodicals, Inc.

[1]  Dejan Plavšić,et al.  Milestones in graphical bioinformatics , 2013 .

[2]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[3]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  A. El-Lakkani,et al.  Similarity analysis of protein sequences based on 2D and 3D amino acid adjacency matrices , 2013 .

[7]  Milan Randic Very efficient search for protein alignment—VESPA , 2012, J. Comput. Chem..

[8]  Alexandru T Balaban,et al.  Graphical representation of proteins. , 2011, Chemical reviews.

[9]  Milan Randic,et al.  Distance/Distance Matrixes , 1994, J. Chem. Inf. Comput. Sci..

[10]  Patrick W. Fowler,et al.  Pentaheptite Modifications of the Graphite Sheet , 2000, J. Chem. Inf. Comput. Sci..

[11]  J. Fenters,et al.  Cytotoxic effects of sulfuric acid mist, carbon particulates, and their mixtures on hamster tracheal epithelium. , 1979, Environmental research.

[12]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[13]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..