On Sufficient Statistics of Least-Squares Superposition of Vector Sets

Superposition by orthogonal transformation of vector sets by minimizing the least-squares error is a fundamental task in many areas of science, notably in structural molecular biology. Its widespread use for structural analyses is facilitated by exact solutions of this problem, computable in linear time. However, in several of these analyses it is common to invoke this superposition routine a very large number of times, often operating through addition or deletion on previously superposed vector sets. This paper derives a set of sufficient statistics for the least-squares orthogonal transformation problem. These sufficient statistics are additive. This property allows for the superposition parameters rotation, translation, and root mean square deviation to be computable as constant time updates from the statistics of partial solutions. We demonstrate that this results in a massive speed up in the computational effort, when compared to the method that recomputes superpositions ab initio. Among others, protein structural alignment algorithms stand to benefit from our results.

[1]  Ruth Nussinov,et al.  A method for simultaneous alignment of multiple protein structures , 2004, Proteins.

[2]  William R. Taylor,et al.  Protein bioinformatics - an algorithmic approach to sequence and structure analysis , 2004 .

[3]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[4]  William Rowan Hamilton,et al.  Elements of Quaternions , 1969 .

[5]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[6]  H. Wolfson,et al.  Flexible protein alignment and hinge detection , 2002, Proteins.

[7]  M. Sippl,et al.  ProSup: a refined tool for protein structure alignment. , 2000, Protein engineering.

[8]  Alan L. Mackay,et al.  Quaternion transformation of molecular orientation , 1984 .

[9]  C. Jacobi,et al.  C. G. J. Jacobi's Gesammelte Werke: Über ein leichtes Verfahren, die in der Theorie der Sacularstorungen vorkommenden Gleichungen numerisch aufzulosen , 1846 .

[10]  K. Dill,et al.  Using quaternions to calculate RMSD , 2004, J. Comput. Chem..

[11]  Robert V. Hogg,et al.  Introduction to Mathematical Statistics. , 1966 .

[12]  R. Diamond A note on the rotational superposition problem , 1988 .

[13]  Arthur M. Lesk,et al.  A toolkit for computational molecular biology I: packing and unpacking of protein coordinate sets , 1983 .

[14]  C. Kenknight Comparison of methods of matching protein structures , 1984 .

[15]  C. Sander,et al.  Detection of common three‐dimensional substructures in proteins , 1991, Proteins.

[16]  Joyce M. Cox,et al.  Mathematical methods used in the comparison of the quaternary structures , 1967 .

[17]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[18]  P. Koehl,et al.  Protein structure similarities. , 2001, Current opinion in structural biology.

[19]  C. Jacobi Über ein leichtes Verfahren die in der Theorie der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen*). , 2022 .

[20]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[21]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[22]  G. Cohen Align : A program to superimpose protein coordinates, accounting for insertions and deletions , 1997 .

[23]  S. Kearsley On the orthogonal transformation used for structural comparisons , 1989 .

[24]  Arthur M. Lesk,et al.  Introduction to protein architecture : the structural biologyof proteins , 2001 .

[25]  A. M. Lesk,et al.  A toolkit for computational molecular biology. II. On the optimal superposition of two sets of coordinates , 1986 .

[26]  A. Konagurthu,et al.  MUSTANG: A multiple structural alignment algorithm , 2006, Proteins.

[27]  S Ramaseshan,et al.  Crystal Physics, Diffraction, Theoretical and General Crystallography , 1981 .

[28]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[29]  A. D. McLachlan,et al.  A mathematical procedure for superimposing atomic coordinates of proteins , 1972 .

[30]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[31]  Arthur M. Lesk,et al.  The unreasonable effectiveness of mathematics in molecular biology , 2000 .

[32]  A. D. McLachlan,et al.  Rapid comparison of protein structures , 1982 .