A hierarchical algorithm for molecular similarity (H‐FORMS)

A new hierarchical method to determine molecular similarity is introduced. The goal of this method is to detect if a pair of molecules has the same structure by estimating a rigid transformation that aligns the molecules and a correspondence function that matches their atoms. The algorithm firstly detect similarity based on the global spatial structure. If this analysis is not sufficient, the algorithm computes novel local structural rotation‐invariant descriptors for the atom neighborhood and uses this information to match atoms. Two strategies (deterministic and stochastic) on the matching based alignment computation are tested. As a result, the atom‐matching based on local similarity indexes decreases the number of testing trials and significantly reduces the dimensionality of the Hungarian assignation problem. The experiments on well‐known datasets show that our proposal outperforms state‐of‐the‐art methods in terms of the required computational time and accuracy. © 2015 Wiley Periodicals, Inc.

[1]  Michael W. Mahoney,et al.  A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions , 2000 .

[2]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[3]  Mihai Dupac,et al.  Advanced Dynamics: Analytical and Numerical Calculations with MATLAB , 2012 .

[4]  K. Dill,et al.  Using quaternions to calculate RMSD , 2004, J. Comput. Chem..

[5]  Marek Sierka,et al.  Similarity recognition of molecular structures by optimal atomic matching and rotational superposition , 2012, J. Comput. Chem..

[6]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[8]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[9]  I. Tsukrov,et al.  Characterization and statistical modeling of irregular porosity in carbon/carbon composites based on X‐ray microtomography data , 2013 .

[10]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[11]  A. Anderson The process of structure-based drug design. , 2003, Chemistry & biology.

[12]  Ernest L. Hall,et al.  Three-Dimensional Moment Invariants , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[14]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[15]  Artem R. Oganov,et al.  Modern methods of crystal structure prediction , 2011 .

[16]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[17]  Jonathan P. K. Doye,et al.  Quantum partition functions from classical distributions: Application to rare-gas clusters , 2001 .

[18]  David J. Wales,et al.  Global minima for water clusters (H2O)n, n ⩽ 21, described by a five-site empirical potential , 2005 .