Crystal structure prediction by data mining

Abstract The ever increasing number of experimentally determined crystal structures allows for the use of data mining methods to address crystallographic questions. Here we study the application of data mining for predicting the arrangement of molecules in unit cells of unknown dimensions (crystal structure prediction) as well as in unit cells of predetermined dimensions (fractional coordinate prediction). In this work, data mining is used to derive an atom-pair potential, which is then compared to known force fields. It is shown that the potential is physically reasonable when the data are sufficient in quality and quantity. For validation the energy function is applied to the problems of crystal structure prediction and fractional coordinate prediction. In both cases a large number of structures was generated and the structures were ranked according to their energies. Structure prediction was considered successful if a structure similar to the experimentally observed one was ranked highest. For crystal structure prediction the energy function is tested on an independent set of crystal structures taken from the P1 and P 1 space groups. We show that approximately 76% of the 218 molecules tested in space group P1 are predicted correctly. For the more complex space group P 1 the success rate is 24%. If the powder diffraction can be indexed, the problem simplifies to fractional coordinate prediction. With the assumption of known cell parameters the structure has been resolved in 92% of the test cases for P1 and 29% for P 1 .

[1]  Donald E. Williams,et al.  Molecular packing analysis of benzene crystals. Part 2. Prediction of experimental crystal structure polymorphs at low and high pressure , 1995 .

[2]  Ulli Englert,et al.  Prediction of crystal structures , 1996 .

[3]  Thomas Lengauer,et al.  A Discrete Algorithm for Crystal Structure Prediction of Organic Molecules , 1997 .

[4]  S. L. Mayo,et al.  DREIDING: A generic force field for molecular simulations , 1990 .

[5]  P Verwer,et al.  A test of crystal structure prediction of small organic molecules. , 2000, Acta crystallographica. Section B, Structural science.

[6]  Robin Taylor,et al.  IsoStar: A library of information about nonbonded interactions , 1997, J. Comput. Aided Mol. Des..

[7]  M Vendruscolo,et al.  Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? , 2000, Proteins.

[8]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[9]  Thomas Lengauer,et al.  Derivation of a scoring function for crystal structure prediction. , 2001, Acta crystallographica. Section A, Foundations of crystallography.

[10]  R. Cramer,et al.  Validation of the general purpose tripos 5.2 force field , 1989 .

[11]  Thomas Lengauer,et al.  Prediction of crystal structures of organic molecules , 1999 .

[12]  F. Leusen Ab initio prediction of polymorphs , 1996 .

[13]  J. Springer,et al.  CONFORMATIONAL AND STRUCTURAL STUDIES OF 2-FLUORO-2-OXO-1,3,2-DIOXAPHOSPHORINANES , 1981 .

[14]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[15]  Jan Kroon,et al.  Attempted prediction of the crystal structures of six monosaccharides , 1995 .

[16]  P. Kollman,et al.  An all atom force field for simulations of proteins and nucleic acids , 1986, Journal of computational chemistry.

[17]  Douglas Philp,et al.  Evolving Opportunities in Structure Solution from Powder Diffraction Data-Crystal Structure Determination of a Molecular System with Twelve Variable Torsion Angles. , 1999, Angewandte Chemie.

[18]  C. Romming,et al.  The Reaction between Diazoalkanes and Allylic Halides Carrying Electronegative gamma-Substituents. 3. The Crystal Structures of Dimethyl 4-(1-Bromo-1-methylethyl)-5-phenyl-4,5-dihydro-3H-pyrazole-3,3-dicarboxylate and Dimethyl 2-(1-Bromo-1-methylethyl)-3-phenyl-1,1-cyclopropanedicarboxylate. , 1983 .

[19]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[20]  A. Gavezzotti,et al.  Generation of possible crystal structures from molecular structure for low-polarity organic compounds , 1991 .