Structural alignment of biomolecules by text modeling techniques

In the era of structural biology, it is necessary to apply efficient and effective tools to compare and align 3D-structure of biomolecules. Although a great number of structural comparison and alignment methods have been developed, none of them gives an exact solution to the problem. In this paper, we introduce a novel method for structural alignment of proteins based on language modelling techniques. In this way, we summarized the protein secondary and tertiary structure in two textual sequences. The first sequence is used to initial superposiotion of secondary structure elements and the second sequence is employed to align the 3D-structure of two compared structure. In order to compare sequences, the method applies a technique inspired from computational linguistics for analysing and comparing textual data. In this strategy, the cross-entropy measure over n-gram models is used to capture regularities between sequences of protein structures. Some experiments were performed in order to compare the performance of the method with the other structure alignment methods. The results of the experiments reported here, provide evidence for the usefulness of the new approach and its preference and applicability comparing with the other related methods.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[3]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[4]  Ioannis Pitas,et al.  Statistical Method of Context Evaluation for Biological Sequence Similarity , 2006, IFIP AI.

[5]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[6]  Ambuj K. Singh,et al.  PSI: indexing protein structures for fast similarity search , 2003, ISMB.

[7]  Jafar Razmara,et al.  A novel method for protein 3D-structure similarity measure based on n-gram modeling , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[8]  Peter Lackner,et al.  Comparative Analysis of Protein Structure Alignments , 2007, BMC Structural Biology.

[9]  Takeshi Kawabata,et al.  MATRAS: a program for protein 3D structure comparison , 2003, Nucleic Acids Res..

[10]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[11]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[12]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[13]  Steve Young,et al.  Corpus-based methods in language and speech processing , 1997 .