论文信息 - A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling

A context evaluation approach for structural comparison of proteins using cross entropy over n-gram modelling

The structural comparison of proteins is a vital step in structural biology that is used to predict and analyse a new unknown protein function. Although a number of different techniques have been explored, the study to develop new alternative methods is still an active research area. The present paper introduces a text modelling-based technique for the structural comparison of proteins. The method models the secondary and tertiary structure of proteins in two linear sequences and then applies them to the comparison of two structures. The technique used for pairwise comparison of the sequences has been adopted from computational linguistics and its well-known techniques for analysing and quantifying textual sequences. To this end, an n-gram modelling technique is used to capture regularities between sequences, and then, the cross-entropy concept is employed to measure their similarities. Several experiments are conducted to evaluate the performance of the method and compare it with other commonly used programs. The assessments for information retrieval evaluation demonstrate that the technique has a high running speed, which is similar to other linear encoding methods, such as 3D-BLAST, SARST, and TS-AMIR, whereas its accuracy is comparable to CE and TM-align, which are high accuracy comparison tools. Accordingly, the results demonstrate that the algorithm has high efficiency compared with other state-of-the-art methods.

Jafar Razmara | Safaai B. Deris | Sepideh Parvizpour

[1] Peter Lackner,et al. Comparative Analysis of Protein Structure Alignments , 2007, BMC Structural Biology.

[2] Kian-Lee Tan,et al. Rapid 3D protein structure database searching using information retrieval techniques , 2004, Bioinform..

[3] Douglas L. Brutlag,et al. Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[4] Jafar Razmara,et al. TS-AMIR: a topology string alignment method for intensive rapid protein structure comparison , 2012, Algorithms for Molecular Biology.

[5] A G Murzin,et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[6] Pierre Tufféry,et al. SA-Search: a web tool for protein structure mining based on a Structural Alphabet , 2004, Nucleic Acids Res..

[7] J. Skolnick,et al. TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[8] Changiz Eslahchi,et al. STON: A novel method for protein three-dimensional structure comparison , 2009, Comput. Biol. Medicine.

[9] Jinn-Moon Yang,et al. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database , 2007, Genome Biology.

[10] Nicholas Piël. Language and Speech Processing , 2007 .

[11] Thomas Steinke,et al. Connectivity independent protein-structure alignment: a hierarchical approach , 2006, BMC Bioinformatics.