Metrics for measuring distances in configuration spaces.

In order to characterize molecular structures we introduce configurational fingerprint vectors which are counterparts of quantities used experimentally to identify structures. The Euclidean distance between the configurational fingerprint vectors satisfies the properties of a metric and can therefore safely be used to measure dissimilarities between configurations in the high dimensional configuration space. In particular we show that these metrics are a perfect and computationally cheap replacement for the root-mean-square distance (RMSD) when one has to decide whether two noise contaminated configurations are identical or not. We introduce a Monte Carlo approach to obtain the global minimum of the RMSD between configurations, which is obtained from a global minimization over all translations, rotations, and permutations of atomic indices.

[1]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[2]  G. C. Benson,et al.  The cohesive and surface energies of some crystals possessing the fluorite structure* , 1962, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[3]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[4]  E. Clementi,et al.  Electronic structure of large molecular systems , 1966 .

[5]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[6]  P. Steinhardt,et al.  Bond-orientational order in liquids and glasses , 1983 .

[7]  Shigeru Obara,et al.  General recurrence formulas for molecular integrals over Cartesian Gaussian functions , 1988 .

[8]  Paolo Toth,et al.  Algorithms and codes for the assignment problem , 1988 .

[9]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using orthonormal matrices , 1988 .

[10]  I. Kuntz,et al.  Structure-Based Molecular Design , 1994 .

[11]  Peter Willett,et al.  Similarity Searching and Clustering of Chemical-Structure Databases Using Molecular Property Data , 1994, J. Chem. Inf. Comput. Sci..

[12]  John Bradshaw,et al.  Similarity and Dissimilarity Methods for Processing Chemical Structure Databases , 1998, Comput. J..

[13]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[14]  Christian Lemmen,et al.  Computational methods for the structural alignment of molecules , 2000, J. Comput. Aided Mol. Des..

[15]  Ramon Carbó-Dorca,et al.  Molecular basis of quantitative structure-properties relationships (QSPR): A quantum similarity approach , 1999, J. Comput. Aided Mol. Des..

[16]  Josef Brandt,et al.  An Effective Topological Symmetry Perception and Unique Numbering Algorithm , 1999, J. Chem. Inf. Comput. Sci..

[17]  Guy H. Grant,et al.  Similarity Calculations Using Two-Dimensional Molecular Representations , 2001, J. Chem. Inf. Comput. Sci..

[18]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[19]  Itay Lotan,et al.  Approximation of protein structure for fast similarity measures , 2003, RECOMB '03.

[20]  Julian Lee,et al.  Unbiased global optimization of Lennard-Jones clusters for N < or =201 using the conformational space annealing method. , 2003, Physical review letters.

[21]  Structure and energetics of Ni clusters with up to 150 atoms , 2003, physics/0306027.

[22]  S. Goedecker Minima hopping: an efficient search method for the global minimum of the potential energy surface of complex molecular systems. , 2004, The Journal of chemical physics.

[23]  K. Dill,et al.  Using quaternions to calculate RMSD , 2004, J. Comput. Chem..

[24]  René Fournier,et al.  Structural optimization of atomic clusters by tabu search in descriptor space , 2004 .

[25]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[26]  Rahul Singh,et al.  Determining Molecular Similarity for Drug Discovery using the Wavelet Riemannian Metric , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[27]  Artem Cherkasov,et al.  Distance based algorithms for small biomolecule classification and structural similarity search , 2006, ISMB.

[28]  Peter Willett,et al.  Similarity Searching in Databases of Chemical Structures , 2007 .

[29]  Martin J. Field,et al.  A Practical Introduction to the Simulation of Molecular Systems: Normal mode analysis , 2007 .

[30]  F. Leusen,et al.  A major advance in crystal structure prediction. , 2008, Angewandte Chemie.

[31]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[32]  Reinhold Schneider,et al.  Daubechies wavelets as a basis set for density functional pseudopotential calculations. , 2008, The Journal of chemical physics.

[33]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[34]  Mario Valle,et al.  How to quantify energy landscapes of solids. , 2009, The Journal of chemical physics.

[35]  Artem R. Oganov Modern Methods of Crystal Structure Prediction: OGANOV:CRYSTAL - METHODS O-BK , 2010 .

[36]  A. Oganov,et al.  Crystal fingerprint space--a novel paradigm for studying crystal-structure sets. , 2010, Acta crystallographica. Section A, Foundations of crystallography.

[37]  Stefan Goedecker,et al.  Crystal structure prediction using the minima hopping method. , 2010, The Journal of chemical physics.

[38]  Fujio Izumi,et al.  VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data , 2011 .

[39]  Artem R. Oganov,et al.  Modern methods of crystal structure prediction , 2011 .

[40]  Fabio Pietrucci,et al.  Graph theory meets ab initio molecular dynamics: atomic structures and transformations at the nanoscale. , 2011, Physical review letters.

[41]  David J. Wales,et al.  Quasi-Continuous Interpolation Scheme for Pathways between Distant Configurations. , 2012, Journal of chemical theory and computation.

[42]  Marek Sierka,et al.  Similarity recognition of molecular structures by optimal atomic matching and rotational superposition , 2012, J. Comput. Chem..

[43]  J. Moussa Comment on "Fast and accurate modeling of molecular atomization energies with machine learning". , 2012, Physical review letters.

[44]  Maciej Haranczyk,et al.  Addressing Challenges of Identifying Geometrically Diverse Sets of Crystalline Porous Materials , 2012, J. Chem. Inf. Model..

[45]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[46]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.