Length Encoded Secondary Structure Profile for Remote Homologous Protein Detection

Protein data has an explosive increasing rate both in volume and diversity, yet many of its structures remain unresolved, as well their functions remain to be identified. The conventional sequence alignment tools are insufficient in remote homology detection, while the current structural alignment tools would encounter the difficulties for proteins of unresolved structure. Here, we aimed to overcome the combination of two major obstacles for detecting remote homologous proteins: proteins with unresolved structure, and proteins of low sequence identity but high structural similarity. We proposed a novel method for improving the performance of protein matching problem, especially for mining remote homologous proteins. In this study, existing secondary structure prediction techniques were applied to provide the locations of secondary structure elements of proteins. The proposed LESS (Length Encoded Secondary Structure) profile was then constructed for segment-based similarity comparison in parallel computing. As compared to a conventional residue-based sequence alignment tool, detection of remote protein homologies through LESS profile is favourable in terms of speed and high sequence diversity, and its accuracy and performance can improve the deficiencies of the traditional primary sequence alignment methodology. This method may further support biologists in protein folding, evolution, and function prediction.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Jinn-Moon Yang,et al.  Protein structure database search and evolutionary classification , 2006, Nucleic acids research.

[7]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[8]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[9]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[10]  Chih-Hung Chang,et al.  Protein structural similarity search by Ramachandran codes , 2007, BMC Bioinformatics.

[11]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[12]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[13]  R V Shohet,et al.  A human protein related to yeast Cdc6p. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Lydia E. Kavraki,et al.  Prediction of enzyme function based on 3D templates of evolutionarily important amino acids , 2008, BMC Bioinformatics.

[15]  Danielson Pb,et al.  The cytochrome P450 superfamily: biochemistry, evolution and drug metabolism in humans. , 2002 .

[16]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[17]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.

[18]  J. Sterling,et al.  Yeast and human genes that affect the Escherichia coli SOS response. , 1999, Proceedings of the National Academy of Sciences of the United States of America.