Finding All Longest Common Segments in Protein Structures Efficiently

The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a<sub>1,</sub> ... ,a<sub>n</sub>) and B = (b<sub>1</sub>, ... ,b<sub>n</sub>) where a<sub>k</sub>, b<sub>k</sub> ϵ ℝ<sup>3</sup>, we are to find, among all the segments f = (a<sub>i</sub>, ... ,a<sub>j</sub>) and g = (b<sub>i</sub>, ... ,b<sub>j</sub>) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean squared deviation (RMSD) between f and g is to be within a given t E R; (2) f and g can be superposed such that for each k, i ≤ k ≤ j, ||a<sub>k</sub> - b<sub>k</sub>|| ≤ t for a given t E R. We give an algorithm of O(n log n + ni) time complexity when the first requirement applies, where I is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any ϵ ℝ, finds a segment of length at least l, but of RMSD up to (1 + ϵ)t, in O(n log n + n=ϵ) time. We propose an FPTAS which for any given ϵ R, finds all the segments f and g of the maximum length which can be superposed such that for each k, i ≤ k ≤ j, ||a<sub>k</sub> - b<sub>k</sub> || ≤ (1 + ϵ)t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(n log<sup>2</sup> n=ϵ<sup>5</sup>) when consecutive points in A are separated by the same distance (which is the case with protein structures). These worst-case runtime complexities are verified using C++ implementations of the algorithms, which we have made available at http://alcs.sourceforge.net/.

[1]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[3]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[4]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[5]  Shuai Cheng Li,et al.  Finding Nearly Optimal GDT Scores , 2011, J. Comput. Biol..

[6]  Samarjit Chakraborty,et al.  Computing Largest Common Point Sets under Approximate Congruence , 2000, ESA.

[7]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[8]  Arne Elofsson,et al.  A study of quality measures for protein threading models , 2001, BMC Bioinformatics.

[9]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[11]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[12]  Navin Goyal,et al.  A Combinatorial Shape Matching Algorithm for Rigid Protein Docking , 2004, CPM.

[13]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[14]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[15]  S. Bryant,et al.  Statistics of sequence-structure threading. , 1995, Current opinion in structural biology.

[16]  Genki Terashi,et al.  LB3D: A Protein Three-Dimensional Substructure Search Program Based on the Lower Bound of a Root Mean Square Deviation Value , 2012, J. Comput. Biol..

[17]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[18]  Tatsuya Akutsu,et al.  Protein Structure Alignment Using Dynamic Programing and Iterative Improvement , 1996 .

[19]  Tetsuo Shibuya,et al.  Searching Protein Three-Dimensional Structures in Faster Than Linear Time , 2010, J. Comput. Biol..

[20]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .